Tuesday, July 19, 2011

Geez, Louise, people, that's stealing.

There is a big brouhaha in some leftist circles today as Internet activist Aaron Swartz was indicted by the U.S. government on a variety of charges. According to the email I got from Demand/Progress, "As best as we can tell, he is being charged with allegedly downloading too many journal articles from the Web. The government contends that downloading so many journal articles constitutes felony computer hacking and should be punished with time in prison."

These people didn't read the indictment.  They could not have.  I, on the other hand, did.

What Schwartz is accused of is not "downloading too many articles."  He is charged with breaking into a locked cabinet at MIT (he has no association with the Institute, being a Fellow at Harvard's Institute for Ethics,  (I think there is a self-writing joke, right there) obtaining access through MIT computers which he had no right to use and accessing the contents of JSTOR, a subscription service for reading and downloading scientific journal articles.

He is then alleged to have downloaded 4.8 million articles from JSTOR, a large portion of their library.  1.7 million of these were potential income makers for JSTOR, being available for purchase from independent publishers through JSTOR's Publisher's Sales Service. All the while taking intricate steps to avoid being caught and stopped.  The intent was to make the information free to the public.

Mr. Swartz is entitled to the presumption of innocence, and I make no stand on whether the charges are true.  Nor do I want to wade into the murky waters of the philosophy behind copyright law, in large part because I lack the knowledge to do so. Nor do I want to discuss the computer hacking charges in detail, because, again, I lack enough knowledge of computers and the law to make reasonable assessments of the situation.

No, what I am seriously annoyed at is this assumption that any time the government goes after someone for computer crimes, the charges are automatically bogus.  Not to mention the incredible misrepresentation of the indictment contained in the Demand/Progress email.

Free information on the Web is a lot  like Marxism: it's a great idea in theory, but ignores a lot of real world  considerations.

Yes, services like JSTOR (and LEXIS/NEXIS) are expensive.  Yes, this means sometimes the information is only available through large institutional libraries, and sometimes you have to pay to access them, depending upon the library's policies.

Yes, these are for-profit enterprises.**  That does not mean that they are parasites, squirreling away materials that would otherwise be at the public's fingertips.  A lot of these articles are in scientific journals, and would no more free online from the publisher than they would be were you to buy a copy of the journal.  Publishers of scientific journals are paid for their work.  Without the journals, where would researchers publish their results? If they did not exist, the exchange of knowledge within and between researchers within and across fields would come to a screeching halt.

[Edited to add: it should be noted here that the Rocket Scientist disagrees vehemently with me on all this.  "Scientific publishers are evil. When you publish a paper you have to sign away all your rights so that they can make money off selling your work to libraries or database companies." The Resident Shrink then pointed out that she is prohibited from posting one of her scholarly papers on the web.  I would argue that this is a different -- decidedly serious -- problem with the system above and beyond the issue presented in the Swartz case.]

Services like JSTOR are incredibly important to the free flow of scholarly information. They make research easier.  Aside from searching capabilities, library space is limited, and JSTOR and its like allow libraries to expand the information they offer sometimes by orders of magnitude. They can also be collators of information:  recently I used LEXIS/NEXIS Corporate Alliance to compile and download a list of all businesses in two cities that had gross annual revenues of more than a million, sorted by income. Could I have gotten this information on my own? Possibly.  Except that I would not have known where to start looking, and it would have undoubtedly taken a great deal longer than ten minutes.

There is a reason they are so expensive.  This access does not come free.  Rights and fee-sharing agreements with publishers have to be negotiated.  Aside from the actual payments, at some point this may mean legal costs. IP lawyers -- even in-house counsel -- are not cheap. All this information has to be stored somewhere, which means server time and space and IT professionals to make sure the servers stay up and running.  And all of this assumes that the information from the publishers is in electronic form to begin with, not anything that has to be entered or scanned.  In the case of the database services I use, someone has to do a lot of interpretation of data.  Someone has to write the search code and interfaces for users, and design, set up and maintain the website. (And users, even the most ardent "information should be free" people, get cranky when websites go down.  Who do they think fixes them, the Internet pixies?)

So, if the charges are true, what Swartz did was abscond with the hard-earned  results of the work of a great many people, from publishers to all the people who work to make sure that information remains in a form that a lot of people can get to (certainly more people than would have access were services like JSTOR to vanish).

That, ladies and gentlemen, is theft.

**Not always: the fee-based databases I am currently using (at least until the trial subscription runs out) are from the Foundation Center, a nonprofit dedicated to helping other nonprofits. These databases are available for free at a some 350 cooperating institutions throughout the country.


  1. I had to point this out to several people on FB who were howling about his 'rights' being trampled on. Um, no, breaking into a wiring closet and gaining physical access to a network is hacking, pure and simple, because one thing that gets drilled into every IT infrastructure geek is once someone has physical access, you're screwed.

    As far as breaking JSTOR's TOS and using scripts, actually planning on releasing the documents publicly, etc, yeah, it's totally stealing, just like if he had actually stolen books from a book store, but even worse, because electronic copies can be more widely distributed.

    I see the Rocket Scientist's and the Resident Shrink's points about how screwed up the scientific paper publishing system is, and the Cloud Physicist agrees with them, but yeah, that's a totally different issue.

  2. I can just imagine the IT professionals like you shuddering at the thought of zealots like Swartz getting physical access to their equipment.

    It still amazes me the number of people who believe in the "information should be free" utopia.