Docstoc

Guerilla Open Access Cookbook

Document Sample
Guerilla Open Access Cookbook Powered By Docstoc
					Guerilla Open Access Cookbook 24 January 2013
http://cryptome.org/2013/01/guerilla-open-access-cookbook.htm




Date: Thu, 24 Jan 2013 15:47:32 +0100
From: Eugen Leitl <eugen[at]leitl.org>
To: zs-p2p[at]googlegroups.com, info[at]postbiota.org, cypherpunks[at]al-qaeda.net
Subject: Guerilla open access cookbook

----- Forwarded message from Bryan Bishop <kanzure[at]gmail.com> -----

From: Bryan Bishop <kanzure[at]gmail.com>
Date: Wed, 23 Jan 2013 20:29:28 -0600
To: science-liberation-front[at]googlegroups.com,
Bryan Bishop <kanzure[at]gmail.com>, kfogel[at]red-bean.com,
williwaw[at]luxobscura.org
Subject: Guerilla open access cookbook

Open access guerilla cookbook

https://github.com/c0nt3nt/oagcookbook
https://github.com/c0nt3nt/oagcookbook/blob/master/oagcb.md

git clone git://github.com/c0nt3nt/oagcookbook.git

"""
The Open Access Guerilla Cookbook
===============================

Dedicated to Aaron Swartz (1986-2013)

Introduction
----------------
In 2008, a short and passionate appeal appeared online which called for all
of us to, "take information, wherever it is stored, make our copies and
share them with the world. We need to take stuff that's out of copyright
and add it to the archive. We need to buy secret databases and put them on
the Web. We need to download scientific journals and upload them to file
sharing networks. We need to fight for Guerilla Open Access."

In January, 2013, its author Aaron Swartz took his own life. Among his many
achievements included an important initiative to liberate the PACER court
records database and leadership roles in several movements that support
free culture. In the last years of his life, he faced serious criminal
charges for following his own manifesto and launching one of the largest
content liberation efforts to date: the downloading of millions of articles
from the JSTOR database.

If the threat of decades of prison time for the JSTOR raid was designed to
strike fear into OA guerillas everywhere, the tragic death of this selfless
supporter of the movement should be met with a renewed commitment to our
ideals. It also offers us an opportunity to up our game.

This document aims to take the manifesto to the next step. This first
version is merely an opening call, and it is in no way complete. Hopefully,
this cookbook will grow to include many recipes and instructional essays
for use by the open access guerilla. Improve on it, share it, and use it.

Principles
-------------

The guerilla open access movement is founded on the basic tenant that the
efforts to promote open access by operating exclusively within existing
copyright regimes and attempts to reform these copyright regimes through
legal reform do not go far enough to protect and expand the realm of free
culture. Working in parallel, but often in the face of criticism from those
promoting legal means to achieve open access, our guerilla movement accepts
the active violation of copyrights and contractual terms of use as
justified. We promote the mass liberation of content from commercial as
well as non-profit or governmental databases for the greater purpose of
sharing knowledge and culture with everyone.

We are closely allied with movements promoting government and corporate
transparency, but open access guerillas recognize that there is
responsibility in sharing information. When relevant, precautions should be
taken to defend the safety and privacy of individuals and communities in
appropriate ways.

We are pirates, but accept a moral imperative to loot more than we need for
our own purposes, and share widely everything we find. We categorically
reject descriptions of our acts as theft. We do not deprive humanity of
culture, we reproduce it. We do not rob owners of their property, but in
many cases violate their temporary and exclusive monopoly to profit by it
in an age when the reasonable limits first set upon this monopoly have been
long forgotten. We violate crippling terms of use on content in an age when
access to almost all information requires accepting contractual limitations
that few ever read. We re-release materials that are already in the public
domain but locked behind paywalls or targetted by copyfraud.
Looking forward, the strength of the guerilla open access movement depends
on combining efforts both secret and open, both collaborative and
individual. We must:

**Share skills and experiences as well as content**

Content liberation should not solely depend on a few individuals with
highly technical knowledge. We must work harder to share our skills and our
experiences widely. This should include efforts to better reach out to
those who are new but eager to learn the more challenging technical side of
our guerilla efforts. We should also work towards establishing standards in
the quality of our content collections, security practices to protect our
illegal efforts, and a code of ethics for operating with the content
sources we raid.

**Recognize a diversity of roles and a diversity of approaches**

We must abandon the image of the lone hacker as the symbol of our movement
and recognize that any successful guerilla movement depends on the work of
people filling many different roles. Some of these are described below.

We are radical in our means, dedicated to our ideals, and will be reviled
and ridiculed by many. Many with similar goals to our own reject us, but
they should still be considered as allies. Our movement exists within an
ecology of culture creation, curation, and consumption. We must respect
everyone who plays a role in the interdependent whole, even as we oppose
the legal regime under which they operate. There are many artists, writers,
scholars, archivists, librarians, developers, and non-profit organizations
who strongly oppose us. They argue that their livelihoods are threatened by
our actions, while others secretely sympathize with us. Let us be mature
enough to admit that some of our targets produced their collections of
content with little funding, charge their access fees with no mind to
profit themselves, and host servers with a barebones maintenance staff.
When we liberate their content, or share with others, keep always in mind
the work that was put into creating and publishing it. Show as much respect
as is compatible with our goals and remember that we are nothing without
them.

**Collaborate to reduce risk and maximize scale**

We now live in the world of crowdsourcing. The power of a lone hacker armed
with a scraper and some knowledge of security is not to be underestimated,
but it comes at great risk of discovery and sacrifice. We must explore ways
to better combine our efforts. The RECAP effort for the PACER archive of
court documents is one model of how this approach works within the legal
realm, we can learn from it and others. We should develop the means to
conceal systematic content liberation in the invisible mass of everyday
consumption. Rather than grabbing whole archives and document databases at
once, we should take them in smaller pieces, with care to preserve their
metadata integrity, and plans in place to reassemble the whole when an
operation is complete.

**Segregate open from secret action**

Aaron Swartz combined strong public advocacy with secret guerilla action.
In his case, it was not key to his discovery, but it is likely to have
impacted the severity of the charges brought against him. In all guerilla
movements it is important to segregate open from secret action. It is
unwise to be an open voice for radical illegal action and also its agent.
If you begin engaging in dangerous OA guerilla action, temper your public
voice and avoid drawing attention to yourself, especially with regard to
the virtues of illegal content liberation.

**Protect the Public Domain First**

Almost none of us are against the principle of limited terms of copyright
as a way to promote the long term expansion of the public domain. The
passion that drives most us to these radical measures would not exist in a
world with copyright terms of ten or fifteen years. At this writing, there
are very few signs that legal reforms will move us back to these limits,
and on the contrary, in many places around the world the trend is in the
opposite direction.

More threatening, however, is the assault upon what is already in the
public domain. In acts of copyfraud, publishers and digital service
providers claim rights on content they do not own. They claim that new
rights are produced in their digitization and, increasingly, they are
moving away from copyright to contractual terms of use to limit our freedom
to use public domain materials obtained from their databases. The ever
increasing proportion of our cultural heritage bound by these contractual
restrictions will have direct consequences. The trend significantly drains
support for initiatives to create fully open and free databases of
materials that may already have been made available by more restrictive
ventures, whether they are commercial or non-profit.

For these reasons, our movement's priority should be on the liberation of
content in the public domain followed by those materials that, by any just
limited term of copyright, should have in the public domain decades ago.

Roles in the Movement
-----------------------------
There are many ways to further the goals of the guerilla open access
movement. Find one or several of the following roles that you feel
comfortable performing.

**The Advocate**

The advocate promotes the cause of open access. Many OA and copyright
reform advocates believe we rob their efforts of legitimacy by making it
easier for content industries to smear them with our sins. Others believe,
on principal, that OA must depend always on voluntary sharing. As one
leader in the legal OA movement puts it, "There is no vigilante OA, no
infringing, expropriating, or piratical OA." We disagree. We believe our
efforts usually compliment those of the leading proponents of free culture
and copyright reform. We do not accept that it is a zero sum game in which
the efforts of one destroys that of the other. However, if you wish to
focus on the role of a public advocate, it would be prudent to join them in
their open rejection of our methods and limit any other guerilla activities.

Another kind of advocate is less public: to promote guerilla OA as well as
legal OA among your friends, colleagues, and those who have the skills to
be of use to the movement. Of particular importance are efforts to convert
the casual pirate into the open access guerilla; from someone who copies
content only for their own consumption to someone who recognizes the value
of a more active and altruistic participation in the movement.

**The Prospector**

The role of Prospector is that of the scout for the movement. A Prospector
identifies databases or collections of interest to the movement and
collects information about its workings. What kind and how much content is
there? How is it organized? What metadata is provided? What is the URL
structure for the database? What is required for access and who provides
the service? And so on.

We need to design a good system for compiling and sharing information of
this nature among us, so that that the Armoror, the Sapper, or the Traitor
can do their work.

**The Scribe**

The Scribe is a unique role. Scholars and collectors of all kinds have
massive collections of material obtained by photographing, scanning, or
transcribing documents or assembling other digital assets. The resulting
digitized content often sits on their own hard drives and are used in only
one or a few publications or exhibits.
A Scribe in the movement is conscious of the importance of their videos,
images, sounds, and digital archive photos. When reasonable, the Scribe
collects or takes photos that go beyond their own limited interest, or
transcribes or indexes materials that may be of interest to others. They
organize their information to the extent possible and make efforts to share
their files widely. Materials not protected by copyright are to made
public, directly posted online as public domain. When materials are
suspected of being protected by copyright, they are distributed through
other means or deposited with a Custodian. The Scribe is one of the roles
that must take particular note of the responsibilities that go along with
the safety and privacy of individuals and communities affected by the
contents of the materials they digitize.

To facilitate the full integration of the Scribe into the movement, we must
work towards better systems of making these collections easy to share, and
a standard for describing and organizing them.

**The Courier**

The Courier is usually someone who has received collections of materials
from another OA guerilla. They do not merely use the materials themselves,
but recognize their obligation to help further share and distribute the
materials widely.

While taking measures to protect themselves, the Courier makes efforts to
share the materials online through torrents or other private repositories
and servers, possibly coordinating these efforts with a Custodian. They
share copies of the collections on portable storage media throughout their
personal networks.

Another form of Courier plays the role of communication mediater between
guerillas that should work to be kept in isolation from eachother, such as
between the Traitor and Prospector and the Armorer.

**The Innkeeper**

The role of the Innkeeper is to manage the safe houses of our movement. We
need safe and secure places to communicate with eachother anonymously.
Ideally, these places should be kept isolated from any servers operated by
the Custodian so that discovery of one does not compromise the other. The
Innkeeper may be willing to host the work produced by the Armorer, various
versions and updates of this cookbook, and other instructional materials.

An Innkeeper must be willing to maintain a communication network that will
likely come under attack from hired hackers, botnets, or directly by
principled opponents. They must have precautions in place to destroy
anything that might betray the identity of our members. They should have
plans to rapidly reproduce the network at a new location when taken down.
They must lead efforts to detect moles and informants within the network
and deny them access.

**The Armorer**

The Armorer is one of the most important roles in our movement. They
create, maintain, and supply our movement with the weapons we need to carry
out our raids. They write and update the scrapers to liberate content. They
create the processing scripts to organize our files, and they design the
protective measures that conceal our efforts.

In the past, the open access guerilla has often been an Armorer, a Traitor,
and Custodian all in one. One huge disadvantage to this is that the Armorer
who is also the Traitor cannot easily share their tools without potentially
coming under scrutiny for launching raids themselves. If discovered, as
Custodian, their liberated materials are potentially surrendered and lost.

We must work together. If an armorer works together with Prospectors to
identify targets and design scrapers, but maintains some distance from (or
at least communicates anonymously with) the Traitors who will deploy them,
we will stand a better chance of a successful operation. The Armorer may
choose to work openly, if they protect connections to others in the
movement. Writing a scraper is not necessarily a crime, but it is strongly
suggested that efforts are made to limit distribution to a trusted network,
at least until an operation is complete. This will delay any
countermeasures by content distributors. More broadly, however, Armorers
should be willing to share, through work such as adding recipes to this
document, tutorials on general approaches to scraping databases and
archives.

**The Sapper**

The Sapper is a special kind of Armorer. The Armorer serves primarily the
Traitor, who will deploy scrapers from within a paywall or behind
restrictive terms of service. The Sapper explores ways to infiltrate the
security of archives and databases and enable outsiders direct access. They
may hack servers directly in order to enable a full and immediate grab of
the databases within. They may create access tunnels for a more cautious
silent raid from the outside. Or, in the most extreme case, they may bring
about a temporary destruction of security to allow large numbers of users
to storm the database in a mass action.

The role of Sapper requires the greatest amount of skill and assumes the
greatest amount of risk, both to the Sapper and the movement as a whole.
Sappers should carefully consider the consequences of their actions and the
impact on the ecology of content creation, curation, and consumption. A
Sapper's raid, depending on how it is carried out, can be a massive act of
sabotage, and has the greatest potential to generate anger and fear from
our opponents but also put pressure on our sympathizers. Use of it as a
tactic should be carefully considered, and great care taken in selecting
the target, the timing, and the approach.

**The Traitor**

The Traitor is at the heart of the content liberation effort. They have
legitimate and legal access to content targetted for liberation and release
what they take to the Custodians and Couriers. They often depend on their
special access to carry out their own daily tasks, and beyond legal
consequences, may sacrifice much if their actions are discovered and their
access revoked. Beyond this risk, the traitor often has conflicted
loyalties. They have received their access in trust, and by helping the
guerilla open access movement, they are inevitably betraying that trust.

The Traitor may know especially well the great efforts required to fund,
produce, curate, and host large collections of data. Even as they liberate
content, they may be concerned that their actions will contravene the
wishes and have some impact on people who may have only very reluctantly
accepted the restrictive copyrights and terms of use that have been forced
upon the product of their efforts by the institutions they serve. If you
work with a Traitor, be sensitive to these concerns, and respectful of what
may seem like arbitrary limitations they wish to place on the scale and
nature of their cooperation with the movement.

As the one opening the gate, the risky work of the Traitor should ideally
be carried out in total secrecy. They should communicate securely and
anonymously with other members of the movement, and limit their other
roles. They do not need the Armorer's technical skills if they can obtain
(indirectly, through a Courier, or directly and anonymously from a movement
resource) the necessary scrapers and other tools produced by the Armorers.
It is important, however, for them to become familiar with security
measures to protect their identity and conceal their liberation efforts.
They should also learn enough about the scrapers etc. that they use so that
they can run them on their own computers and adhere to the basic scraper
guidelines (below).

**The Custodian**

Traitors or Sappers who have liberated content should move as quickly as
possible to deposit this content with Custodians. The role of the Custodian
is first and foremost that of canonical preservation. They also play an
important role as the primary distributor of content to Couriers. They
should also keep themselves informed about the safest and most effective
means to widely distribute content on file sharing networks, secret
repositories, and by other means. When hosts have been raided; copies of
content have been taken down by legal authorities; or hosted copies
disappear through neglect (lack of seeders for torrents, etc.), it is the
job of Custodians to take measures to get the content to new sources.

Whenever possible the Custodian should check that the collections they
received to not bear traces of the origin, looking for signs of watermarked
PDFs, scraper files with revealing login information, or other signs that
would reveal the identity of the guerilla who liberated the content.

Custodians should take precautions against their own discovery, and arrange
for copies of materials in their care to be deposited in a safe place
should they be discovered. They should also ideally limit actions in other
roles of the movement to limit the risk of exposure.

**The Archivist**

The role of the Archivist is to preserve and improve the integrity of
liberated content. They identify missing or problematic metadata, they
process and organize files, and potentially provide conversions of
problematic formats that materials are found in. If independent operations
liberate parts of collections, the Archivist can help bring them together.
They create note documents to include in the distribution of liberated
content which describes the scope of the collection, identifies problems in
the material, provides information on the originating source, and suggests
ways to cite it. They collaborate with Prospectors to identify further work
that needs to be done on already raided collections, and with Custodians to
spread the best possible version of a collection.

Archivists with strong digital skills can also create the tools and
platforms, both local and hosted, that will allow users to conveniently
search and browse liberated content. A zip file full of PDFs or movie files
is an order of magnitute less useful than a collection which is well
indexed, annotated, and conveniently searchable.

**The Sculptor**

The Sculptor is someone who is willing to use and create something new with
liberated content. They analyze and study collections for use in their own
work. They produce new works of art and culture. They remash, reproduce,
and transform liberated content. They generate innovative ways for others
to manipulate and use content.
Ideally we should all be Sculptors. The value of the guerilla open access
movement comes from facilitating the Sculptors of today, tomorrow, and
every day that content would otherwise be locked away behind copyright and
restricted use.

Scraper Guidelines
------------------------------

Scrapers are scripts designed to selectively extract content from servers.
They are often written in scripting languages such as Python, Ruby, or Perl
that can be run from a variety of operating systems. They are one of the
most powerful tools of the guerilla open access movement.

When scrapers or other raiding tools are designed by the Armorer and
deployed by the Traitor, the two general principles of *respect* and
*concealment* should be followed. For this purpose keep these guidelines in
mind:

1. Minimize disruption to the host and their other users by
2. Limiting the scale and speed of a raid appropriately and
3. Employing methods to mask your raid as reasonable use of the target
resource

More specifically:

- Never attempt to grab an entire database in the space of hours or over a
very short time span relative to its total size
- Generally limit the practice of multiple simultaneous downloads from a
server.
- If a small number of simultaneous downloads are made, carry it out from
multiple network locations and ideally with multiple access credentials.
- Do not place others at risk by using their network access credentials
unless they understand the potential consequences and volunteer.
- Employ proxies, MAC spoofing, and other measures as needed to conceal
access locations when this is possible.
- Use random intervals between downloads to simulate human behavior
- Limit the operation of scrapers to certain hours to simulate human
behavior or bury activity in periods of large regular traffic
- Design scrapers to download materials randomly (while logging completed
downloads) rather than in sequence, or else a random groups of smaller
sequences consistent with human behavior

**Recipes**
-----------------

The remainder of this cookbook should be composed of recipes. These may
include instructions for the use of or the code for scrapers and other
tools that are appropriate for wider distribution. They may include
tutorials and descriptions of good practices for the various roles of the
movement. They may describe appropriate security measures. They may recount
past victories and failures of the movementbbut in a way that does not
compromise anyone's identity. They should ideally not include any direct
links to online resources, but may provide suggestions on how to use search
techniques to find them, as they may often move. Under each recipe, include
the date it was written, an optional author pseudonym, and if you
distribute an edited version of this document, update the timestamp and
version information at the bottom for the cookbook as a whole.

Securing Communication
--------------------------------
By yellowElephant

Security in communication between members of the movement is of great
importance. While legal authorities will want to investigate our actions,
perhaps of a greater threat is that publishers and content industries that
have significant resources will want to undercover who we are and can
easily outsource their work to hired hackers.

It is important to follow some basic guidelines that can be grouped as
follows:

1. Create barriers of separation between your regular activity and movement
activity
2. Mask your identity
3. Mask your location
4. Encrypt your information
5. Limited Circles of Trust

**Separate your Movement Activity**

The first principle is a general one that you should always keep in mind.
Whenever possible, create a separate sphere for all things concerned with
your activity in the movement. Some of these things are just common sense.
If you email about the movement, do so from a separate email account, not
your regular personal email account. If you tweet about the movement, then
unless you are only an Advocate who is not connected with illegal
operations of the movement, tweet from a separate account. If you write
about the movement on a website or other online service (again unless you
are only an Advocate) then do so from accounts set up for the purpose.

If you need to set up online service accounts that require email
verification use email addresses that come from an online email service
which does not require you to provide further means of identification. This
secures your anonymity as long as you mask your location.

When you are working on something related to the movement, even if you have
masked your location, avoid doing other things online that may associate
your IP address with other activity online and thus make it possible to
trace back to you.

Consider using a different browser for all your movement activity. Or, at
the very least, use "privacy" or "secure" or "icognito" mode in your
browser. This will prevent your browsing history from being saved. More
importantly it will prevent cookies from operation. If you are logged into
a social networking service and then start doing movement activity without
this layer of security in another tab or window, the website you are using
may have ways to identify you through cookies.

**Mask Your Identity**

If you are a Courier or an Advocate, what you write and do will be exposed
to the scrutiny of those outside the movement. Adopt a code name, but be
sure to choose one that cannot be associated with you, even by friends.
Thus, if you are a famous dog trainer, don't choose a code name of a breed
of dog.

What you write can be subject to automatic text analysis. Authorship can be
compared by means of algorithms. Try to vary your writing style. Make a
list of idiosyncratic adjectives or other turns of phrase that you only use
in some texts you write but not others. Write verbosely in some places, and
in a short blunt manner in another. Use leet speak in some contexts, and
grammatically correct language elsewhere. If you are exposed, this will
help prevent you from being associated with all of your activity.

**Mask Your Location**

This is very important. Do not engage in movement activity from your own IP
address (your identifying address on the internet) and spoof your MAC (the
hardware address on your network or wireless card). If you are connected to
a University of Vienna computer network, and you do anything online without
masking your location, it is easy to trace the activity back to the
university network, which will likely have a log of who is registered to
use the IP, or at least the rough physical location that the person
connected from. From there it is either direct discovery or discovery with
subsequent surveillance.

*Mask your IP*
Use a free or paid VPN (virtual private network) service that cannot be
traced back to you personally. Make sure to configure it so that all
traffic to and from your computer is routed through this VPN. Connected to
a VPN in Russia, a user on the network in Vienna will appear to be
connecting from Russia.

*Spoof your MAC*

This is easily done with a little bit of experience on the command line.
There are many utilities that can help you do this on Windows. On Linux or
OS X simply open a terminal and enter the appropriate command that you can
find many places online.

*Fool Your Enemy*

If, as a Courier for example, you engage in movement activity that may give
indirect hints about your geographical location, then cloud their view. If
you have a twitter account and are based in Canada, follow a set of users
that might imply you are in France and read French. If you are emailing
users in the UK, consider doing it from a free email service based in
Germany.

**Encrypt Your Communication**

All movement related activity through email and chat should be secure or
anonymous and generally both.

*Email and General Encryption*

Learn about GnuPG and the basics of public-private key encryption. Create
an encryption key for yourself. Make sure the passphrase is very long. "The
yellow elephant flew effortlessly behind the old barn" is far more secure
than "&%tX90!" and easier to remember. Export the public key as ASCII
armored and give it to your movement contacts. They will use *your* public
key to encrypt email they send to you. You will use *your* corresponding
secret key to decrypt the email they sent you that was encrypted by *your*
public key. The process is reversed when you email them. You will need
*their* public key to email them. Sign your communications with your key to
help confirm your identity.

*Secure Chat*

Consider using anonymous communication on a pre-agreed IRC chat for
something that is simple and fast and can be performed directly in your
(secure/icognito) browser (while connected to VPN). There are also many
plug-ins to secure communication on popular chat protocols, as well as some
dedicated secure chat clients. Make sure that neither you nor the person
you are speaking with have the client configured to log the communication.

*Keep Movement Related Files Secure*

If you are apprehended all your computers will be taken. You cannot be
coerced into providing passwords though, so if you are smart, you will keep
movement files completely secure.

You can encrypt your files and directories directly with GnuPG. Another
common method is to create an ecrypted partition or "disk image" that is
encrypted. Boot this partition or disk image when you need access to your
movement files. Or have a separate encrypted hard drive that you load when
you are ready to do work for the movement.

**Limited Circles of Trust**

Do not tell all your cool friends that you are active in the movement.
Limit this information to people you trust *and* who you believe can be
recruited to active participation in the movement. Whenever possible,
maintain a segregated guerilla cell structure that the movement has
operated under up until now. If you recruit several people to work with you
do not pass on information about their identity or contact info to the
person who recruited you. If you have contact with Couriers, let your
up-contact know of their existence so tasks to/from Courier can be
transmitted via you. One exception are competent Armorers. Their skills are
in high demand, and if you find them, consider keeping them separate from
your own cell and "passing them up" the chain to the person who connected
you put them in contact with an Armorer or Innkeeper you know of. The
scripts/tools they create should be distributed through the network
securely and then at some point appropriate openly. As far as this author
is aware, we have no centralized structure. We operate in loose and a cell
based movement, with anonymous cross-cell communication in online forums
and chat maintained by admins (Innkeepers) who don't usually know the
identity of anyone on the platforms they manage. Knowledge of personal real
identities should, whenever possible, *not* be available beyond the
individual cell. Traitors (and the more rare Sapper) are to be protected at
all costs since they are directly liable for legal prosectution.

If there are people who you personally know but do not trust, yet who you
think can be recruited, use a Courier as an intermediary. The Courier
establishes contact, and if it goes well, they can "hand back" or "pass on"
the contact once secure anonymous means of communication have been
established and a role identified.

Cells that personally know eachother should take steps to ensure that keys
were exchanged in a way to guarantee their genuine nature.

Limiting the circle of trust is key to the survival of the movment. In the
past, many online hacker networks have been broken when one member is
discovered and agrees to cooperate in exchange for a lighter sentence. A
previously trusted person will unwillingly do all they can to expose other
cell members and escape their own punishment. Be suspicious of suddent
attempts to get more personal information from you by contacts in the
movement and again, by keeping the circle of trust small, damage can be
contained.

Ways to Make Your Scrapers More Human
-----------------------------------------------------

Below are some simple code snippets to give you ideas about how to design
scrapers that will behave more like humans, thus concealing to some degree
your raid on a database. Of course, if you conduct the entire raid from a
single IP address, or your are always logging in with the same user
credentials, a careful analysis of server logs and traffic can lead to
discovery. However, the following may help fool the lazy server
administrator who merely glances over access logs from time to time.

*In Ruby:*

```ruby
# => This method can be used to create some variety in pauses between
grabbing files from a server
# => and thus simulate human behavior. You set a default break time (here
twoseconds, in reality a
# => random time of 1-3 seconds) and several more rare longer breaks. You
then indicate a probability
# => for these other longer breaks to occur. For example, currently there
is a 0.2% chance that the
# => scraper will take a roughly one hour break, but a 0.5% chance it will
take a 30 minute break,
# => and so on.
def randomBreak()

#set the various times for a break:
twoseconds=rand(1..3)
fifteensec=rand(12..18)
onemin=rand(50..70)
fivemin=rand(250..350)
tenmin=rand(500..700)
thirtymin=rand(1600..2000)
onehour=rand(3300..3900)
#default break time:
sleeptime=twoseconds
#roll the dice:
x=rand(1..1000)
#but percentage chance that there is a longer break:
if (1..2).member?(x) then sleeptime=onehour end
if (3..7).member?(x) then sleeptime=thirtymin end
if (8..18).member?(x) then sleeptime=tenmin end
if (19..32).member?(x) then sleeptime=fivemin end
if (33..50).member?(x) then sleeptime=onemin end
if (51..120).member?(x) then sleeptime=fifteensec end

puts "Sleeping "+sleeptime.to_s+" seconds."
sleep sleeptime
end

randomBreak()

#!/usr/local/bin/ruby

# => This method can be used to create some variety in pauses between
grabbing files from a server
# => This method can be used to force your script to only operate within
certain "working hours"
# => and thus simulate human behavior. If you have a scraper pulling files
24 hours a day, anyone
# => inspecting server logs closely will immediately know automated
downloading is happening.
# => Use this script to make it at least somewhat more plausible that a
very diligent human being
# => was downloading files from their favorite database.
# =>
# => Call this method once each time a file has been downloaded completely
before proceeding to
# => the next round of the loop. Only proceed, if the method returns true.
If it is "outside
# => working hours" it will return false. You can use a while loop to wait
until working hours.

# fromgmt - how many hours earlier or later than GMT timezone you wish to
operate in

def workinghours(fromgmt=0,starthour=9,endhour=17)
#The current hour at GMT timezone:
currentgmt=Time.new.gmtime.hour
currentmin=Time.new.min
#The current hour at timezone of desired operation:
currenthour=currentgmt+fromgmt
if currenthour==24 then currenthour=0 end
#we assume here workingg hours do not cross midnight, anyone want to redo
the following to
#account for that possibility?
if currenthour>=starthour && currenthour<=endhour
return true
end
return false
end

#how we might use this:
while !workinghours(fromgmt=1,starthour=1,endhour=11)
# wait one minute and check the time again
sleep 60
end

#!/usr/local/bin/ruby

# => This very simple snippet can be used to set a maximum limit on
# => the time the script will run. Here again we simulate likely human
# => behavior. You can run the script comfortable that it will run
# => for a limited time.

# maxminutes - The number of minutes you want this script to run
maxminutes=120 # for example, 2 hours or 120 minutes
starttime=Time.new

# OPEN YOUR MAIN LOOP HERE

if Time.new-starttime>maxminutes*60 then
break # out of your loop and wrap up the script...
end

#!/usr/local/bin/ruby

# => If you use Mechanize in Ruby to do your scraping you can set the "user
agent," that is,
# => the browser that you are pretending to be when you grab files from the
server. You can
# => add some noise to your activity by choosing a random user agent each
time you use the
# => script.

def randUserAgent()
#returns a random user agent but weighs towards popular browsers
#returns a random user agent but weighs towards popular browsers
#problem: it doesn't include Chrome or Opera. Sample is for Ruby 1.9,
need change for
#earlier versions of Ruby.
return ["Windows IE 7","Windows IE 7","Windows IE 7","Windows IE
7","Windows IE 6","Windows Mozilla","Windows Mozilla","Windows
Mozilla","Windows Mozilla","Mac Safari","Mac FireFox","Mac FireFox","Linux
Firefox"].sample
end

# => ...
myparser.user_agent_alias=randUserAgent()

```

Recipe: Overcome Built-In Limit on JSTOR Liberator
---------------------------------------------------------------------
By FreeDam | Status: Works as of 2012.1.20

The Aaron Swartz Memorial JSTOR Liberator offers an elegant tool for
uploading a single publically viewable JSTOR document to a memorial
archive. The tool creates a cookie in your browser when you use it and
prevents you from making more than one contribution. Perhaps you want to
offer ten PDFs to the memorial? Maybe fifty?

To overcome the limit without repeatedly opening a new browser window, we
can modify your JSTOR Liberator bookmarklet code slightly:

```javascript
javascript:var date=new
Date();date.setTime(date.getTime()+(10*24*60*60*1000));var expires = ";
expires="+date.toGMTString();document.cookie='jstorLib_count=0;'+expires+';
path=/';(function()%7Bvar%20s%3Ddocument.createElement(%27script%27)
%3Bs.type%3D%27text/javascript%27%3Bs.src%3D%27
http://aaronsw.archiveteam.org/js%27%3Bdocument.getElementsByTagName
(%27head%27)%5B0%5D.appendChild(s)%3B%7D)()%3B
```

This merely sets the "jstorLib_count" to 0, thus allowing you to make
multiple submissions. This client side limit will be replaced by a server
based limit at some point, but in the meantime, you can use this modified
bookmarklet to make multiple contributions easily.


This document is in the public domain.
Changes
--------------------

2013.1.13 williwaw - original posted

2013.1.16 yellowElephant - added recommended security precautions for
movement members

2013.1.19 kfogel - language

2013.1.19-20 williwaw - ruby.recipe add some ways to simulate human
behavior in scraper

2013.1.20 FreeDam - Overcome Built-in limit on JSTOR Liberator
"""

- Bryan
http://heybryan.org/
1 512 203 0507

--
You received this message because you are subscribed to the Google Groups "science-liberation-
front" group.
To unsubscribe from this group, send email to science-liberation-
front+unsubscribe[at]googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

----- End forwarded message -----
--
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE




The Open Access Guerilla Cookbook
Dedicated to Aaron Swartz (1986-2013)

Introduction
In 2008, a short and passionate appeal appeared online which called for all of us to, "take
information, wherever it is stored, make our copies and share them with the world. We need to
take stuff that's out of copyright and add it to the archive. We need to buy secret databases and
put them on the Web. We need to download scientific journals and upload them to file sharing
networks. We need to fight for Guerilla Open Access."

In January, 2013, its author Aaron Swartz took his own life. Among his many achievements
included an important initiative to liberate the PACER court records database and leadership
roles in several movements that support free culture. In the last years of his life, he faced serious
criminal charges for following his own manifesto and launching one of the largest content
liberation efforts to date: the downloading of millions of articles from the JSTOR database.

If the threat of decades of prison time for the JSTOR raid was designed to strike fear into OA
guerillas everywhere, the tragic death of this selfless supporter of the movement should be met
with a renewed commitment to our ideals. It also offers us an opportunity to up our game.

This document aims to take the manifesto to the next step. This first version is merely an opening
call, and it is in no way complete. Hopefully, this cookbook will grow to include many recipes
and instructional essays for use by the open access guerilla. Improve on it, share it, and use it.

Principles
The guerilla open access movement is founded on the basic tenant that the efforts to promote
open access by operating exclusively within existing copyright regimes and attempts to reform
these copyright regimes through legal reform do not go far enough to protect and expand the
realm of free culture. Working in parallel, but often in the face of criticism from those promoting
legal means to achieve open access, our guerilla movement accepts the active violation of
copyrights and contractual terms of use as justified. We promote the mass liberation of content
from commercial as well as non-profit or governmental databases for the greater purpose of
sharing knowledge and culture with everyone.

We are closely allied with movements promoting government and corporate transparency, but
open access guerillas recognize that there is responsibility in sharing information. When relevant,
precautions should be taken to defend the safety and privacy of individuals and communities in
appropriate ways.

We are pirates, but accept a moral imperative to loot more than we need for our own purposes,
and share widely everything we find. We categorically reject descriptions of our acts as theft. We
do not deprive humanity of culture, we reproduce it. We do not rob owners of their property, but
in many cases violate their temporary and exclusive monopoly to profit by it in an age when the
reasonable limits first set upon this monopoly have been long forgotten. We violate crippling
terms of use on content in an age when access to almost all information requires accepting
contractual limitations that few ever read. We re-release materials that are already in the public
domain but locked behind paywalls or targetted by copyfraud.
Looking forward, the strength of the guerilla open access movement depends on combining
efforts both secret and open, both collaborative and individual. We must:

Share skills and experiences as well as content

Content liberation should not solely depend on a few individuals with highly technical
knowledge. We must work harder to share our skills and our experiences widely. This should
include efforts to better reach out to those who are new but eager to learn the more challenging
technical side of our guerilla efforts. We should also work towards establishing standards in the
quality of our content collections, security practices to protect our illegal efforts, and a code of
ethics for operating with the content sources we raid.

Recognize a diversity of roles and a diversity of approaches

We must abandon the image of the lone hacker as the symbol of our movement and recognize
that any successful guerilla movement depends on the work of people filling many different roles.
Some of these are described below.

We are radical in our means, dedicated to our ideals, and will be reviled and ridiculed by many.
Many with similar goals to our own reject us, but they should still be considered as allies. Our
movement exists within an ecology of culture creation, curation, and consumption. We must
respect everyone who plays a role in the interdependent whole, even as we oppose the legal
regime under which they operate. There are many artists, writers, scholars, archivists, librarians,
developers, and non-profit organizations who strongly oppose us. They argue that their
livelihoods are threatened by our actions, while others secretely sympathize with us. Let us be
mature enough to admit that some of our targets produced their collections of content with little
funding, charge their access fees with no mind to profit themselves, and host servers with a
barebones maintenance staff. When we liberate their content, or share with others, keep always
in mind the work that was put into creating and publishing it. Show as much respect as is
compatible with our goals and remember that we are nothing without them.

Collaborate to reduce risk and maximize scale

We now live in the world of crowdsourcing. The power of a lone hacker armed with a scraper
and some knowledge of security is not to be underestimated, but it comes at great risk of
discovery and sacrifice. We must explore ways to better combine our efforts. The RECAP effort
for the PACER archive of court documents is one model of how this approach works within the
legal realm, we can learn from it and others. We should develop the means to conceal systematic
content liberation in the invisible mass of everyday consumption. Rather than grabbing whole
archives and document databases at once, we should take them in smaller pieces, with care to
preserve their metadata integrity, and plans in place to reassemble the whole when an operation
is complete.

Segregate open from secret action
Aaron Swartz combined strong public advocacy with secret guerilla action. In his case, it was not
key to his discovery, but it is likely to have impacted the severity of the charges brought against
him. In all guerilla movements it is important to segregate open from secret action. It is unwise
to be an open voice for radical illegal action and also its agent. If you begin engaging in
dangerous OA guerilla action, temper your public voice and avoid drawing attention to yourself,
especially with regard to the virtues of illegal content liberation.

Protect the Public Domain First

Almost none of us are against the principle of limited terms of copyright as a way to promote the
long term expansion of the public domain. The passion that drives most us to these radical
measures would not exist in a world with copyright terms of ten or fifteen years. At this writing,
there are very few signs that legal reforms will move us back to these limits, and on the contrary,
in many places around the world the trend is in the opposite direction.

More threatening, however, is the assault upon what is already in the public domain. In acts of
copyfraud, publishers and digital service providers claim rights on content they do not own. They
claim that new rights are produced in their digitization and, increasingly, they are moving away
from copyright to contractual terms of use to limit our freedom to use public domain materials
obtained from their databases. The ever increasing proportion of our cultural heritage bound by
these contractual restrictions will have direct consequences. The trend significantly drains
support for initiatives to create fully open and free databases of materials that may already have
been made available by more restrictive ventures, whether they are commercial or non-profit.

For these reasons, our movement's priority should be on the liberation of content in the public
domain followed by those materials that, by any just limited term of copyright, should have in
the public domain decades ago.

Roles in the Movement
There are many ways to further the goals of the guerilla open access movement. Find one or
several of the following roles that you feel comfortable performing.

The Advocate

The advocate promotes the cause of open access. Many OA and copyright reform advocates
believe we rob their efforts of legitimacy by making it easier for content industries to smear them
with our sins. Others believe, on principal, that OA must depend always on voluntary sharing. As
one leader in the legal OA movement puts it, "There is no vigilante OA, no infringing,
expropriating, or piratical OA." We disagree. We believe our efforts usually compliment those of
the leading proponents of free culture and copyright reform. We do not accept that it is a zero
sum game in which the efforts of one destroys that of the other. However, if you wish to focus on
the role of a public advocate, it would be prudent to join them in their open rejection of our
methods and limit any other guerilla activities.
Another kind of advocate is less public: to promote guerilla OA as well as legal OA among your
friends, colleagues, and those who have the skills to be of use to the movement. Of particular
importance are efforts to convert the casual pirate into the open access guerilla; from someone
who copies content only for their own consumption to someone who recognizes the value of a
more active and altruistic participation in the movement.

The Prospector

The role of Prospector is that of the scout for the movement. A Prospector identifies databases or
collections of interest to the movement and collects information about its workings. What kind
and how much content is there? How is it organized? What metadata is provided? What is the
URL structure for the database? What is required for access and who provides the service? And
so on.

We need to design a good system for compiling and sharing information of this nature among us,
so that that the Armoror, the Sapper, or the Traitor can do their work.

The Scribe

The Scribe is a unique role. Scholars and collectors of all kinds have massive collections of
material obtained by photographing, scanning, or transcribing documents or assembling other
digital assets. The resulting digitized content often sits on their own hard drives and are used in
only one or a few publications or exhibits.

A Scribe in the movement is conscious of the importance of their videos, images, sounds, and
digital archive photos. When reasonable, the Scribe collects or takes photos that go beyond their
own limited interest, or transcribes or indexes materials that may be of interest to others. They
organize their information to the extent possible and make efforts to share their files widely.
Materials not protected by copyright are to made public, directly posted online as public domain.
When materials are suspected of being protected by copyright, they are distributed through other
means or deposited with a Custodian. The Scribe is one of the roles that must take particular note
of the responsibilities that go along with the safety and privacy of individuals and communities
affected by the contents of the materials they digitize.

To facilitate the full integration of the Scribe into the movement, we must work towards better
systems of making these collections easy to share, and a standard for describing and organizing
them.

The Courier

The Courier is usually someone who has received collections of materials from another OA
guerilla. They do not merely use the materials themselves, but recognize their obligation to help
further share and distribute the materials widely.

While taking measures to protect themselves, the Courier makes efforts to share the materials
online through torrents or other private repositories and servers, possibly coordinating these
efforts with a Custodian. They share copies of the collections on portable storage media
throughout their personal networks.

Another form of Courier plays the role of communication mediater between guerillas that should
work to be kept in isolation from eachother, such as between the Traitor and Prospector and the
Armorer.

The Innkeeper

The role of the Innkeeper is to manage the safe houses of our movement. We need safe and
secure places to communicate with eachother anonymously. Ideally, these places should be kept
isolated from any servers operated by the Custodian so that discovery of one does not
compromise the other. The Innkeeper may be willing to host the work produced by the Armorer,
various versions and updates of this cookbook, and other instructional materials.

An Innkeeper must be willing to maintain a communication network that will likely come under
attack from hired hackers, botnets, or directly by principled opponents. They must have
precautions in place to destroy anything that might betray the identity of our members. They
should have plans to rapidly reproduce the network at a new location when taken down. They
must lead efforts to detect moles and informants within the network and deny them access.

The Armorer

The Armorer is one of the most important roles in our movement. They create, maintain, and
supply our movement with the weapons we need to carry out our raids. They write and update
the scrapers to liberate content. They create the processing scripts to organize our files, and they
design the protective measures that conceal our efforts.

In the past, the open access guerilla has often been an Armorer, a Traitor, and Custodian all in
one. One huge disadvantage to this is that the Armorer who is also the Traitor cannot easily share
their tools without potentially coming under scrutiny for launching raids themselves. If
discovered, as Custodian, their liberated materials are potentially surrendered and lost.

We must work together. If an armorer works together with Prospectors to identify targets and
design scrapers, but maintains some distance from (or at least communicates anonymously with)
the Traitors who will deploy them, we will stand a better chance of a successful operation. The
Armorer may choose to work openly, if they protect connections to others in the movement.
Writing a scraper is not necessarily a crime, but it is strongly suggested that efforts are made to
limit distribution to a trusted network, at least until an operation is complete. This will delay any
countermeasures by content distributors. More broadly, however, Armorers should be willing to
share, through work such as adding recipes to this document, tutorials on general approaches to
scraping databases and archives.

The Sapper
The Sapper is a special kind of Armorer. The Armorer serves primarily the Traitor, who will
deploy scrapers from within a paywall or behind restrictive terms of service. The Sapper
explores ways to infiltrate the security of archives and databases and enable outsiders direct
access. They may hack servers directly in order to enable a full and immediate grab of the
databases within. They may create access tunnels for a more cautious silent raid from the outside.
Or, in the most extreme case, they may bring about a temporary destruction of security to allow
large numbers of users to storm the database in a mass action.

The role of Sapper requires the greatest amount of skill and assumes the greatest amount of risk,
both to the Sapper and the movement as a whole. Sappers should carefully consider the
consequences of their actions and the impact on the ecology of content creation, curation, and
consumption. A Sapper's raid, depending on how it is carried out, can be a massive act of
sabotage, and has the greatest potential to generate anger and fear from our opponents but also
put pressure on our sympathizers. Use of it as a tactic should be carefully considered, and great
care taken in selecting the target, the timing, and the approach.

The Traitor

The Traitor is at the heart of the content liberation effort. They have legitimate and legal access
to content targetted for liberation and release what they take to the Custodians and Couriers.
They often depend on their special access to carry out their own daily tasks, and beyond legal
consequences, may sacrifice much if their actions are discovered and their access revoked.
Beyond this risk, the traitor often has conflicted loyalties. They have received their access in
trust, and by helping the guerilla open access movement, they are inevitably betraying that trust.

The Traitor may know especially well the great efforts required to fund, produce, curate, and
host large collections of data. Even as they liberate content, they may be concerned that their
actions will contravene the wishes and have some impact on people who may have only very
reluctantly accepted the restrictive copyrights and terms of use that have been forced upon the
product of their efforts by the institutions they serve. If you work with a Traitor, be sensitive to
these concerns, and respectful of what may seem like arbitrary limitations they wish to place on
the scale and nature of their cooperation with the movement.

As the one opening the gate, the risky work of the Traitor should ideally be carried out in total
secrecy. They should communicate securely and anonymously with other members of the
movement, and limit their other roles. They do not need the Armorer's technical skills if they can
obtain (indirectly, through a Courier, or directly and anonymously from a movement resource)
the necessary scrapers and other tools produced by the Armorers. It is important, however, for
them to become familiar with security measures to protect their identity and conceal their
liberation efforts. They should also learn enough about the scrapers etc. that they use so that they
can run them on their own computers and adhere to the basic scraper guidelines (below).

The Custodian

Traitors or Sappers who have liberated content should move as quickly as possible to deposit this
content with Custodians. The role of the Custodian is first and foremost that of canonical
preservation. They also play an important role as the primary distributor of content to Couriers.
They should also keep themselves informed about the safest and most effective means to widely
distribute content on file sharing networks, secret repositories, and by other means. When hosts
have been raided; copies of content have been taken down by legal authorities; or hosted copies
disappear through neglect (lack of seeders for torrents, etc.), it is the job of Custodians to take
measures to get the content to new sources.

Whenever possible the Custodian should check that the collections they received to not bear
traces of the origin, looking for signs of watermarked PDFs, scraper files with revealing login
information, or other signs that would reveal the identity of the guerilla who liberated the content.

Custodians should take precautions against their own discovery, and arrange for copies of
materials in their care to be deposited in a safe place should they be discovered. They should also
ideally limit actions in other roles of the movement to limit the risk of exposure.

The Archivist

The role of the Archivist is to preserve and improve the integrity of liberated content. They
identify missing or problematic metadata, they process and organize files, and potentially
provide conversions of problematic formats that materials are found in. If independent operations
liberate parts of collections, the Archivist can help bring them together. They create note
documents to include in the distribution of liberated content which describes the scope of the
collection, identifies problems in the material, provides information on the originating source,
and suggests ways to cite it. They collaborate with Prospectors to identify further work that
needs to be done on already raided collections, and with Custodians to spread the best possible
version of a collection.

Archivists with strong digital skills can also create the tools and platforms, both local and hosted,
that will allow users to conveniently search and browse liberated content. A zip file full of PDFs
or movie files is an order of magnitute less useful than a collection which is well indexed,
annotated, and conveniently searchable.

The Sculptor

The Sculptor is someone who is willing to use and create something new with liberated content.
They analyze and study collections for use in their own work. They produce new works of art
and culture. They remash, reproduce, and transform liberated content. They generate innovative
ways for others to manipulate and use content.

Ideally we should all be Sculptors. The value of the guerilla open access movement comes from
facilitating the Sculptors of today, tomorrow, and every day that content would otherwise be
locked away behind copyright and restricted use.

Scraper Guidelines
Scrapers are scripts designed to selectively extract content from servers. They are often written
in scripting languages such as Python, Ruby, or Perl that can be run from a variety of operating
systems. They are one of the most powerful tools of the guerilla open access movement.

When scrapers or other raiding tools are designed by the Armorer and deployed by the Traitor,
the two general principles of respect and concealment should be followed. For this purpose keep
these guidelines in mind:

   1. Minimize disruption to the host and their other users by
   2. Limiting the scale and speed of a raid appropriately and
   3. Employing methods to mask your raid as reasonable use of the target resource

More specifically:

      Never attempt to grab an entire database in the space of hours or over a very short time
       span relative to its total size
      Generally limit the practice of multiple simultaneous downloads from a server.
      If a small number of simultaneous downloads are made, carry it out from multiple
       network locations and ideally with multiple access credentials.
      Do not place others at risk by using their network access credentials unless they
       understand the potential consequences and volunteer.
      Employ proxies, MAC spoofing, and other measures as needed to conceal access
       locations when this is possible.
      Use random intervals between downloads to simulate human behavior
      Limit the operation of scrapers to certain hours to simulate human behavior or bury
       activity in periods of large regular traffic
      Design scrapers to download materials randomly (while logging completed downloads)
       rather than in sequence, or else a random groups of smaller sequences consistent with
       human behavior

Recipes
The remainder of this cookbook should be composed of recipes. These may include instructions
for the use of or the code for scrapers and other tools that are appropriate for wider distribution.
They may include tutorials and descriptions of good practices for the various roles of the
movement. They may describe appropriate security measures. They may recount past victories
and failures of the movement–but in a way that does not compromise anyone's identity. They
should ideally not include any direct links to online resources, but may provide suggestions on
how to use search techniques to find them, as they may often move. Under each recipe, include
the date it was written, an optional author pseudonym, and if you distribute an edited version of
this document, update the timestamp and version information at the bottom for the cookbook as a
whole.

Securing Communication
By yellowElephant

Security in communication between members of the movement is of great importance. While
legal authorities will want to investigate our actions, perhaps of a greater threat is that publishers
and content industries that have significant resources will want to undercover who we are and
can easily outsource their work to hired hackers.

It is important to follow some basic guidelines that can be grouped as follows:

   1.   Create barriers of separation between your regular activity and movement activity
   2.   Mask your identity
   3.   Mask your location
   4.   Encrypt your information
   5.   Limited Circles of Trust

Separate your Movement Activity

The first principle is a general one that you should always keep in mind. Whenever possible,
create a separate sphere for all things concerned with your activity in the movement. Some of
these things are just common sense. If you email about the movement, do so from a separate
email account, not your regular personal email account. If you tweet about the movement, then
unless you are only an Advocate who is not connected with illegal operations of the movement,
tweet from a separate account. If you write about the movement on a website or other online
service (again unless you are only an Advocate) then do so from accounts set up for the purpose.

If you need to set up online service accounts that require email verification use email addresses
that come from an online email service which does not require you to provide further means of
identification. This secures your anonymity as long as you mask your location.

When you are working on something related to the movement, even if you have masked your
location, avoid doing other things online that may associate your IP address with other activity
online and thus make it possible to trace back to you.

Consider using a different browser for all your movement activity. Or, at the very least, use
"privacy" or "secure" or "icognito" mode in your browser. This will prevent your browsing
history from being saved. More importantly it will prevent cookies from operation. If you are
logged into a social networking service and then start doing movement activity without this layer
of security in another tab or window, the website you are using may have ways to identify you
through cookies.

Mask Your Identity

If you are a Courier or an Advocate, what you write and do will be exposed to the scrutiny of
those outside the movement. Adopt a code name, but be sure to choose one that cannot be
associated with you, even by friends. Thus, if you are a famous dog trainer, don't choose a code
name of a breed of dog.
What you write can be subject to automatic text analysis. Authorship can be compared by means
of algorithms. Try to vary your writing style. Make a list of idiosyncratic adjectives or other
turns of phrase that you only use in some texts you write but not others. Write verbosely in some
places, and in a short blunt manner in another. Use leet speak in some contexts, and
grammatically correct language elsewhere. If you are exposed, this will help prevent you from
being associated with all of your activity.

Mask Your Location

This is very important. Do not engage in movement activity from your own IP address (your
identifying address on the internet) and spoof your MAC (the hardware address on your network
or wireless card). If you are connected to a University of Vienna computer network, and you do
anything online without masking your location, it is easy to trace the activity back to the
university network, which will likely have a log of who is registered to use the IP, or at least the
rough physical location that the person connected from. From there it is either direct discovery or
discovery with subsequent surveillance.

Mask your IP

Use a free or paid VPN (virtual private network) service that cannot be traced back to you
personally. Make sure to configure it so that all traffic to and from your computer is routed
through this VPN. Connected to a VPN in Russia, a user on the network in Vienna will appear to
be connecting from Russia.

Spoof your MAC

This is easily done with a little bit of experience on the command line. There are many utilities
that can help you do this on Windows. On Linux or OS X simply open a terminal and enter the
appropriate command that you can find many places online.

Fool Your Enemy

If, as a Courier for example, you engage in movement activity that may give indirect hints about
your geographical location, then cloud their view. If you have a twitter account and are based in
Canada, follow a set of users that might imply you are in France and read French. If you are
emailing users in the UK, consider doing it from a free email service based in Germany.

Encrypt Your Communication

All movement related activity through email and chat should be secure or anonymous and
generally both.

Email and General Encryption

Learn about GnuPG and the basics of public-private key encryption. Create an encryption key
for yourself. Make sure the passphrase is very long. "The yellow elephant flew effortlessly
behind the old barn" is far more secure than "&%tX90!" and easier to remember. Export the
public key as ASCII armored and give it to your movement contacts. They will use your public
key to encrypt email they send to you. You will use your corresponding secret key to decrypt the
email they sent you that was encrypted by your public key. The process is reversed when you
email them. You will need their public key to email them. Sign your communications with your
key to help confirm your identity.

Secure Chat

Consider using anonymous communication on a pre-agreed IRC chat for something that is
simple and fast and can be performed directly in your (secure/icognito) browser (while
connected to VPN). There are also many plug-ins to secure communication on popular chat
protocols, as well as some dedicated secure chat clients. Make sure that neither you nor the
person you are speaking with have the client configured to log the communication.

Keep Movement Related Files Secure

If you are apprehended all your computers will be taken. You cannot be coerced into providing
passwords though, so if you are smart, you will keep movement files completely secure.

You can encrypt your files and directories directly with GnuPG. Another common method is to
create an ecrypted partition or "disk image" that is encrypted. Boot this partition or disk image
when you need access to your movement files. Or have a separate encrypted hard drive that you
load when you are ready to do work for the movement.

Limited Circles of Trust

Do not tell all your cool friends that you are active in the movement. Limit this information to
people you trust and who you believe can be recruited to active participation in the movement.
Whenever possible, maintain a segregated guerilla cell structure that the movement has operated
under up until now. If you recruit several people to work with you do not pass on information
about their identity or contact info to the person who recruited you. If you have contact with
Couriers, let your up-contact know of their existence so tasks to/from Courier can be transmitted
via you. One exception are competent Armorers. Their skills are in high demand, and if you find
them, consider keeping them separate from your own cell and "passing them up" the chain to the
person who connected you put them in contact with an Armorer or Innkeeper you know of. The
scripts/tools they create should be distributed through the network securely and then at some
point appropriate openly. As far as this author is aware, we have no centralized structure. We
operate in loose and a cell based movement, with anonymous cross-cell communication in online
forums and chat maintained by admins (Innkeepers) who don't usually know the identity of
anyone on the platforms they manage. Knowledge of personal real identities should, whenever
possible, not be available beyond the individual cell. Traitors (and the more rare Sapper) are to
be protected at all costs since they are directly liable for legal prosectution.

If there are people who you personally know but do not trust, yet who you think can be recruited,
use a Courier as an intermediary. The Courier establishes contact, and if it goes well, they can
"hand back" or "pass on" the contact once secure anonymous means of communication have
been established and a role identified.

Cells that personally know eachother should take steps to ensure that keys were exchanged in a
way to guarantee their genuine nature.

Limiting the circle of trust is key to the survival of the movment. In the past, many online hacker
networks have been broken when one member is discovered and agrees to cooperate in exchange
for a lighter sentence. A previously trusted person will unwillingly do all they can to expose
other cell members and escape their own punishment. Be suspicious of suddent attempts to get
more personal information from you by contacts in the movement and again, by keeping the
circle of trust small, damage can be contained.

Ways to Make Your Scrapers More Human
Below are some simple code snippets to give you ideas about how to design scrapers that will
behave more like humans, thus concealing to some degree your raid on a database. Of course, if
you conduct the entire raid from a single IP address, or your are always logging in with the same
user credentials, a careful analysis of server logs and traffic can lead to discovery. However, the
following may help fool the lazy server administrator who merely glances over access logs from
time to time.

In Ruby:

# => This method can be used to create some variety in pauses between
grabbing files from a server
# => and thus simulate human behavior. You set a default break time (here
twoseconds, in reality a
# => random time of 1-3 seconds) and several more rare longer breaks. You
then indicate a probability
# => for these other longer breaks to occur. For example, currently there is
a 0.2% chance that the
# => scraper will take a roughly one hour break, but a 0.5% chance it will
take a 30 minute break,
# => and so on.
def randomBreak()

  #set the various times for a break:
  twoseconds=rand(1..3)
  fifteensec=rand(12..18)
  onemin=rand(50..70)
  fivemin=rand(250..350)
  tenmin=rand(500..700)
  thirtymin=rand(1600..2000)
  onehour=rand(3300..3900)
  #default break time:
  sleeptime=twoseconds
  #roll the dice:
  x=rand(1..1000)
  #but percentage chance that there is a longer break:
  if (1..2).member?(x) then sleeptime=onehour end
  if   (3..7).member?(x) then sleeptime=thirtymin end
  if   (8..18).member?(x) then sleeptime=tenmin end
  if   (19..32).member?(x) then sleeptime=fivemin end
  if   (33..50).member?(x) then sleeptime=onemin end
  if   (51..120).member?(x) then sleeptime=fifteensec end

  puts "Sleeping "+sleeptime.to_s+" seconds."
  sleep sleeptime
end

randomBreak()

#!/usr/local/bin/ruby

# => This method can be used to create some variety in pauses between
grabbing files from a server
# => This method can be used to force your script to only operate within
certain "working hours"
# => and thus simulate human behavior. If you have a scraper pulling files 24
hours a day, anyone
# => inspecting server logs closely will immediately know automated
downloading is happening.
# => Use this script to make it at least somewhat more plausible that a very
diligent human being
# => was downloading files from their favorite database.
# =>
# => Call this method once each time a file has been downloaded completely
before proceeding to
# => the next round of the loop. Only proceed, if the method returns true. If
it is "outside
# => working hours" it will return false. You can use a while loop to wait
until working hours.

# fromgmt - how many hours earlier or later than GMT timezone you wish to
operate in

def workinghours(fromgmt=0,starthour=9,endhour=17)
  #The current hour at GMT timezone:
  currentgmt=Time.new.gmtime.hour
  currentmin=Time.new.min
  #The current hour at timezone of desired operation:
  currenthour=currentgmt+fromgmt
  if currenthour==24 then currenthour=0 end
  #we assume here workingg hours do not cross midnight, anyone want to redo
the following to
  #account for that possibility?
  if currenthour>=starthour && currenthour<=endhour
    return true
  end
  return false
end

#how we might use this:
while !workinghours(fromgmt=1,starthour=1,endhour=11)
  # wait one minute and check the time again
  sleep 60
end
#!/usr/local/bin/ruby

#   =>   This very simple snippet can be used to set a maximum limit on
#   =>   the time the script will run. Here again we simulate likely human
#   =>   behavior. You can run the script comfortable that it will run
#   =>   for a limited time.

# maxminutes - The number of minutes you want this script to run
maxminutes=120 # for example, 2 hours or 120 minutes
starttime=Time.new

# OPEN YOUR MAIN LOOP HERE

if Time.new-starttime>maxminutes*60 then
  break # out of your loop and wrap up the script...
end

#!/usr/local/bin/ruby

# => If you use Mechanize in Ruby to do your scraping you can set the "user
agent," that is,
# => the browser that you are pretending to be when you grab files from the
server. You can
# => add some noise to your activity by choosing a random user agent each
time you use the
# => script.

def randUserAgent()
  #returns a random user agent but weighs towards popular browsers
  #returns a random user agent but weighs towards popular browsers
  #problem: it doesn't include Chrome or Opera. Sample is for Ruby 1.9, need
change for
  #earlier versions of Ruby.
  return ["Windows IE 7","Windows IE 7","Windows IE 7","Windows IE
7","Windows IE 6","Windows Mozilla","Windows Mozilla","Windows
Mozilla","Windows Mozilla","Mac Safari","Mac FireFox","Mac FireFox","Linux
Firefox"].sample
end

# => ...
myparser.user_agent_alias=randUserAgent()



Recipe: Overcome Built-In Limit on JSTOR Liberator
By FreeDam | Status: Works as of 2012.1.20

The Aaron Swartz Memorial JSTOR Liberator offers an elegant tool for uploading a single
publically viewable JSTOR document to a memorial archive. The tool creates a cookie in your
browser when you use it and prevents you from making more than one contribution. Perhaps you
want to offer ten PDFs to the memorial? Maybe fifty?
To overcome the limit without repeatedly opening a new browser window, we can modify your
JSTOR Liberator bookmarklet code slightly:

javascript:var date=new
Date();date.setTime(date.getTime()+(10*24*60*60*1000));var expires = ";
expires="+date.toGMTString();document.cookie='jstorLib_count=0;'+expires+';
path=/';(function()%7Bvar%20s%3Ddocument.createElement(%27script%27)%3Bs.type
%3D%27text/javascript%27%3Bs.src%3D%27http://aaronsw.archiveteam.org/js%27%3B
document.getElementsByTagName(%27head%27)%5B0%5D.appendChild(s)%3B%7D)()%3B

This merely sets the "jstorLib_count" to 0, thus allowing you to make multiple submissions. This
client side limit will be replaced by a server based limit at some point, but in the meantime, you
can use this modified bookmarklet to make multiple contributions easily.

This document is in the public domain.

Changes
2013.1.13 williwaw - original posted

2013.1.16 yellowElephant - added recommended security precautions for movement members

2013.1.19 kfogel - language

2013.1.19-20 williwaw - ruby.recipe add some ways to simulate human behavior in scraper

2013.1.20 FreeDam - Overcome Built-in limit on JSTOR Liberator

https://github.com/c0nt3nt/oagcookbook/blob/master/oagcb.md




Bryan Bishop (kanzure)
     email    kanzure@gmail.com

    phone     (+1) 512-203-0507

   location   austin, texas

      code    github.com/kanzure

  freenode    ##hplusroadmap
     jabber     kanzure@gmail.com

       AIM      kanzure

      skype     kanzure

        YIM     kanzure

        QQ      2302976763

googleplus      googleplus:102967708777429109844

    twitter     @kanzure

  facebook      facebook.com/kanzure

   linkedin     ...

    last.fm     last.fm/user/kanzure



public speaking
part 1: http://youtube.com/watch?v=QBwME7Dn5cc

part 2: http://youtube.com/watch?v=Gep7DyCqBbA

part 1: http://youtube.com/watch?v=i4ex52LYDe8

part 2: http://youtube.com/watch?v=hzUVd0skbc8

part 1: http://youtube.com/watch?v=-n39RK4inzg
part 2: http://youtube.com/watch?v=3S9z6H_EFqQ
part 3: http://youtube.com/watch?v=Edl6uFn3_g4




leitl's lair
futurism is overrated
    Skip to content


            Home
            About


    establishing a habit
    Posted on 2011/01/16 by eleitl


    As my first post I’m going to link to Mike Darwin’s excellent Chronopause blog. I encourage you
    to do the same.


    As to the basic theme, I’ve decided to get rid of the red train theme, and go with plain default
    (Twenty Ten 1.1). It’s pretty readable as is.


    The idea of this blog is to train myself to write semi-daily, using transhumanist-relevant
    technology news as a motivator. As a side effect, it will help to improve my style and clarify my
    own understanding of the problem.

    Posted in Uncategorized|Leave a comment



    http://leitl.org/

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:10
posted:1/25/2013
language:Luxembourgish
pages:36