Archive for the ‘code’ Category

Freedom for Our Files: Code and Slides

Monday, May 16th, 2011

A two-day workshop, with both technical hands-on and idea-driven components. Learn to scrape data and reuse public and private information by writing custom code and using the Facebook API. Additionally, we’ll converse and conceptualize ideas to reclaim our data literally and also imagine what is possible with our data once it is ours!

Here are the slides and some of the code samples from the Freedom for Our Files (FFOF) workshop I just did in Linz at Art Meets Radical Openness (LiWoLi 2011).

The first one is a basic scraping demo that uses “find-replace” parsing to change specific words (I’m including examples below the code)

<?php

/* Basic scraping demo with "find-replace" parsing
* Owen Mundy Copyright 2011 GNU/GPL */

$url = "http://www.bbc.co.uk/news/"; // 0. url to start with

$contents = file_get_contents($url); // 1. get contents of page in a string

// 2. search and replace contents
$contents = str_replace( // str_replace(search, replace, string)
"News",
"<b style='background:yellow; color:#000; padding:2px'>LIES</b>",
$contents);

print $contents; // 3. print result

?>

Basic scraping demo with “foreach” parsing

<?php

/* Basic scraping demo with "foreach" parsing
* Owen Mundy Copyright 2011 GNU/GPL */
 
$url = "http://www.bbc.co.uk/news/"; // 0. url to start with

$lines = file($url); // 1. get contents of url in an array

foreach ($lines as $line_num => $line) // 2. loop through each line in page
{
// 3. if opening string is found
if(strpos($line, '<h2 class="top-story-header ">'))
{
$get_content = true; // 4. we can start getting content
}

if($get_content == true)
{
$data .= $line . "\n"; // 5. then store content until closing string appears
}

if(strpos($line, "</h2>")) // 6. if closing HTML element found
{
$get_content = false; // 7. stop getting content
}
}

print $data; // 8. print result

?>

Basic scraping demo with “regex” parsing

<?php

/* Basic scraping demo with "regex" parsing
* Owen Mundy Copyright 2011 GNU/GPL */
 
$url = "http://www.bbc.co.uk/news/"; // 0. url to start with

$contents = file_get_contents($url); // 1. get contents of url in a string

// 2. match title
preg_match('/<title>(.*)<\/title>/i', $contents, $title);

print $title[1]; // 3. print result

?>

Basic scraping demo with “foreach” and “regex” parsing

<?php

/* Basic scraping demo with "foreach" and "regex" parsing
* Owen Mundy Copyright 2011 GNU/GPL */

// url to start
$url = "http://www.bbc.co.uk/news/";

// get contents of url in an array
$lines = file($url);

// look for the string
foreach ($lines as $line_num => $line)
{
// find opening string
if(strpos($line, '<h2 class="top-story-header ">'))
{
$get_content = true;
}

// if opening string is found
// then print content until closing string appears
if($get_content == true)
{
$data .= $line . "\n";
}

// closing string
if(strpos($line, "</h2>"))
{
$get_content = false;
}
}

// use regular expressions to extract only what we need...

// png, jpg, or gif inside a src="..." or src='...'
$pattern = "/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i";
preg_match_all($pattern, $data, $images);

// text from link
$pattern = "/(<a.*>)(\w.*)(<.*>)/ismU";
preg_match_all($pattern, $data, $text);

// link
$pattern = "/(href=[\"'])(.*?)([\"'])/i";
preg_match_all($pattern, $data, $link);

/*
// test if you like
print "<pre>";
print_r($images);
print_r($text);
print_r($link);
print "</pre>";
*/

?>

<html>
<head>
<style>
body { margin:0; }
.textblock { position:absolute; top:600px; left:0px; }
span { font:5.0em/1.0em Arial, Helvetica, sans-serif; line-height:normal;
background:url(trans.png); color:#fff; font-weight:bold; padding:5px }
a { text-decoration:none; color:#900 }
</style>
</head>
<body>
<img src="<?php print $images[1][0] ?>" height="100%"> </div>
<div class="textblock"><span><a href="<?php print "http://www.bbc.co.uk".$link[2][0] ?>"><?php print $text[2][0] ?></a></span><br>
</div>
</body>
</html>

And the example, which presents the same information in a new way…

Advanced scraping demo with “regex” parsing. Retrieves current weather in any city and colors the background accordingly. The math below for normalization could use some work.

<?php

/* Advanced scraping demo with "regex" parsing. Retrieves current
* weather in any city and colors the background accordingly.
* The math below for normalization could use some work.
* Owen Mundy Copyright 2011 GNU/GPL */

?>

<html>
<head>
<style>
body { margin:20; font:1.0em/1.4em Arial, Helvetica, sans-serif; }
.text { font:10.0em/1.0em Arial, Helvetica, sans-serif; color:#000; font-weight:bold; }
.navlist { list-style:none; margin:0; position:absolute; top:20px; left:200px }
.navlist li { float:left; margin-right:10px; }
</style>
</head>

<body onLoad="document.f.q.focus();">

<form method="GET" action="<?php print $_SERVER['PHP_SELF']; ?>" name="f">

<input type="text" name="q" value="<?php print $_GET['q'] ?>" />
<input type="submit" />

</form>

<ul class="navlist">
<li><a href="?q=anchorage+alaska">anchorage</a></li>
<li><a href="?q=toronto+canada">toronto</a></li>
<li><a href="?q=new+york+ny">nyc</a></li>
<li><a href="?q=london+uk">london</a></li>
<li><a href="?q=houston+texas">houston</a></li>
<li><a href="?q=linz+austria">linz</a></li>
<li><a href="?q=rome+italy">rome</a></li>
<li><a href="?q=cairo+egypt">cairo</a></li>
<li><a href="?q=new+delhi+india">new delhi</a></li>
<li><a href="?q=mars">mars</a></li>
</ul>

<?php

// make sure the form has been sent
if (isset($_GET['q']))
{
// get contents of url in an array
if ($str = file_get_contents('http://www.google.com/search?q=weather+in+'
. str_replace(" ","+",$_GET['q'])))
{

// use regular expressions to extract only what we need...

// 1, 2, or 3 digits followed by any version of the degree symbol
$pattern = "/[0-9]{1,3}[º°]C/";
// match the pattern with a C or with an F
if (preg_match_all($pattern, $str, $data) > 0)
{
$scale = "C";
}
else
{
$pattern = "/[0-9]{1,3}[º°]F/";
if (preg_match_all($pattern, $str, $data) > 0)
{
$scale = "F";
}
}

// remove html
$temp_str = strip_tags($data[0][0]);
// remove everything except numbers and points
$temp = ereg_replace("[^0-9..]", "", $temp_str);

if ($temp)
{

// what is the scale?
if ($scale == "C"){
// convert ºC to ºF
$tempc = $temp;
$tempf = ($temp*1.8)+32;
}
else if ($scale == "F")
{
// convert ºF to ºC
$tempc = ($temp-32)/1.8;
$tempf = $temp;
}
// normalize the number
$color = round($tempf/140,1)*10;
// cool -> warm
// scale -20 to: 120
$color_scale = array(
'0, 0,255',
'0,128,255',
'0,255,255',
'0,255,128',
'0,255,0',
'128,255,0',
'255,255,0',
'255,128,0',
'255, 0,0'
);

?>

<style> body { background:rgb(<?php print $color_scale[$color] ?>) }</style>
<div class="text"><?php print round($tempc,1) ."&deg;C " ?></div>
<?php print round($tempf,1) ?>&deg;F

<?php

}
else
{
print "city not found";
}
}
}
?>

</body>
</html>




For an xpath tutorial check this page.

For the next part of the workshop we used Give Me My Data to export our information from Facebook in order to revisualize it with Nodebox 1.0, a Python IDE similar to Processing.org. Here’s an example:

Update: Some user images from the workshop. Thanks all who joined!

Mutual friends (using Give Me My Data and Graphviz) by Rob Canning

identi.ca network output (starting from my username (claude) with depth 5, rendered to svg with ‘sfdp’ from graphviz) by Claude Heiland-Allen

Convert NTSC video to PAL with smooth motion

Sunday, January 16th, 2011

When converting NTSC digital video to PAL the pixel aspect ratio needs to change from 720 x 480 (NTSC) to 720 x 576 (PAL). Depending on your project the more important problem is the transition from 29.97 (NTSC) frames per second to 25 (PAL).

I found Final Cut Pro and QuickTime both convert 29.97 to 25 frames per second by cutting the five extra frames to make it fit. This results in a loss of temporal resolution, making motion in the footage jerk and skip because the frames which created the illusion of motion are missing.

There are a few commercial applications that can convert NTSC to PAL with smooth motion, but I followed advice on this forum which suggested using Compressor for the standards conversion:

  1. Export an NTSC Quicktime movie from Final Cut Pro without compression
  2. In Compressor, select a DV PAL preset
  3. Turn on Frame controls and set resizing and retiming to “better” or “best.”
  4. Run Compressor. This took >3 hours for 12 minutes of uncompressed footage.

This should give you a 720 x 576 (PAL CCIR 601) with 25 frames per second. Finally, in DVD Studio Pro make sure you choose PAL before you import any footage, and leave all the regions selected which is the default.

UNIX: List open files

Saturday, January 15th, 2011


Can’t eject a CD or unmount an external hard drive on your mac because of this error: The disc is in use and could not be ejected. Try quitting applications and try again.?

This UNIX command reports a list of all open files and applications that opened them. Open Terminal and replace the name of your volume below to test.

$ lsof | grep /Volumes/media/

Thanks

Addendum: Here’s an even more helpful command: eject the disk with UNIX when the GUI won’t allow it.

$ diskutil eject [Mount Point|Disk Identifier|Device Node]

Oh, and a final tip nestled in this addition: If you have a space in the name (for example you had two disks mounted named “backup” and OSX named the second one to mount “backup 1″) then you can easily reference the name (or any file or directory name with a space) with a backslash which “escapes” the character. Typing the first few characters and then type the tab key will do it automatically.

$ diskutil eject /Volumes/backup\ 1/

Network graph grouping: A small art world

Thursday, December 16th, 2010

This “Mutual friends network graph” created with Nodebox using data I exported with Give Me My Data contains 540 “Facebook friends” and their connections to each other. When the graph renders it attempts to position people who have lots of connections closer together. With this you can see groups unfold based on your own social networks. Since I have spent more time in academia than I have at specific jobs my “clusters” are based mostly on my academic history.

You can also see that there are a lot of connections between my high school and where I did my undergraduate study, which is based on the fact they are located very close to each other, so friends from high school also chose the same university or town to live in. There are also a lot of interconnections between Indiana University where I did my undergrad, the University of California, San Diego, where I did graduate study, and Florida State University, where I teach now. This is probably due to the fact that my connections are all within a given field, in my case visual arts, and points to the often expressed notion that “the art world is actually very small.”

Random Hacks of Kindness (RHoK) and Google Person Finder

Tuesday, December 14th, 2010

Last weekend I took part in Random Hacks of Kindness an international hackathon dedicated to creating useful systems to respond to critical global challenges. I met with other programmers at the Betahaus in Berlin and worked with Tim Schwartz and Mikkel Gravgaard on Google Person Finder a searchable database of missing persons that helps people find loved ones during disasters. It was used during the 2010 Haiti and Chilean Earthquakes and is developed by volunteers and employees of Google.

Photo by Flickr user rhokberlin

Photo by Flickr user nblr

Photo by Flickr user nblr

Give Me My Data and exporting mutual friends

Sunday, November 28th, 2010

On the one-year anniversary of the beginning of Give Me My Data I’m very happy to announce that you can now export your friends and your mutual friends from facebook using two new formats. Both of the data formats are geared towards making graphs by displaying objects and their relationships. Needless to say, this is the most often requested feature since the official beta launch in April 2010. See below for more information

The DOT language

DOT is a plain text graph description language and can be rendered using a variety of layout applications like Graphviz or Tulip.

This example (saved as a plain text file with the .dot extension)

graph G
{
	a -- b -- c;
	b -- d;
}

Produces something like this

Python / Nodebox 1.0

The other file format is also for visualizing relationships. You can copy and paste the contents into a plain text file saved with a .py extension and open it in Nodebox, a Mac application that uses Python to create 2D visuals. Learn more about creating graphs in Nodebox.

Here’s an example file. My mutual friends exported from Facebook…

Keyword Intervention update

Sunday, October 24th, 2010

I launched Keyword Intervention in January 2007 and for almost four years now it has been scraping topical search terms and attracting random traffic. Today I moved the project to its own domain, keywordintervention.com and also updated the documentation on the site. Below is a sample of the last 500 search terms by users all around the world. The full list is here.

Plutonian Striptease VIII: Owen Mundy

Wednesday, October 20th, 2010

Originally published in Plutonian Striptease, a series of interviews with with experts, owners, users, fans and haters of social media, to map the different views on this topic, outside the existing discussions surrounding privacy.

PS: Social networks are often in the news, why do you think this is?

OM: Assuming “social networks” refers to the online software, application programming interfaces (APIs), and the data that constitutes sites like MySpace, Facebook, and Twitter, I feel its popular to discuss them in the news for many reasons.

Online applications that enable enhanced connectivity for individuals and other entities are relatively new and there is an apparent potential for wealth through their creation and the connections they enable. News organizations are businesses, so they naturally follow the money, “reporting” on topics which are considered worthwhile to advertisers who buy space in their pages, pop-ups, and commercial breaks.

Additionally, the public is still grappling with the ability for online software to collect and distribute data about them, both with their permission and through clandestine means at once. Most users of social networking software don’t understand the methods or potential for behavior manipulation in these user interfaces and therefore are wary of what they share. Other users seem to be more care-free, making many private details from their lives public.

Finally, online social networking software is still evolving, so it’s difficult for users to establish a consensus about best practices. I believe the accelerating functionality of web 2.0 software will continue to complicate how we feel about online social networks for much longer.

PS: In what way do they differ from older forms of communication on the Internet?

OM: If web1.0 consisted of static pages, web2.0 is made-up of dynamic information, generated by the millions of users accessing the web through personal computers and mobile devices. This rapid rise in user-generated content has been made possible by the development of online applications using a myriad of open source programming languages. Sites like youtube.com (launched 2005 and written primarily in Python) and Facebook.com (2004, PHP) which consist entirely of content contributed by users, store information in databases allowing for fast searching, sorting, and re-representation. Initially, the web consisted of information and we had to sift through it manually. Web2.0 allows for the growth of a semantic web and possibilities for machines to help us describe, understand, and share exponential amounts of data through tags, feeds, and social networks.

PS: Who is ultimately responsible for what happens to the data you upload to social networks?

OM: Obviously users are responsible for deciding what information they publish online. Still, Facebook’s “Recommended Privacy Settings” should emphasize more not less. While their privacy settings always seem to be a work in progress. One thing they do consistently is default to less privacy overall, thus more sharing of your information on their site. For a website that depends on user-generated content the motivation to encourage sharing is clear enough. Still, why do they use the word “privacy” if they’re not actually embracing the idea?

I honestly feel that all software that accepts user input, credit cards and phone companies, should be bound by strict written rules preventing them from sharing my information with advertising companies or the government. It seems like a basic human right to me. If there are laws preventing me from downloading and sharing copywritten music then there should be laws protecting my intellectual property as well.

PS: Do you read Terms of Use or EULA’s and keep up to date about changes applied to them?

OM: Only when curious or suspicious. They’re usually intentionally full of so much legalese that I don’t bother torturing myself. But as an artist and programmer, I have an interest in sharing my information in public space because I benefit from its appreciation. Perhaps a more accurate answer to this question would come from someone who doesn’t have this interest.

PS: Do you think you’ve got a realistic idea about the quantity of information that is out there about you?

OM: Yes I do. I am definitely conscious of the information I share. In addition I also research methods of surveillance and incorporate that knowledge into my art practice. So while I haven’t seen the visualization that determines the likelihood that my grandmother is a terrorist threat, it’s guaranteed that one is possible with a few clicks and some multi-million dollar defense contractor dataveillance tool. This is true for any human being through aggregation of credit card records, travel information, political contributions, and what we publish online.

PS: How do you value your private information now? Do you think anything can happen that will make you value it differently in the future?

OM: It’s important to me to situate my art practice in public space where it can provoke discussion for all audiences. But yes, I do intentionally avoid distributing dorky pictures of my mountain bike adventures. Seriously though, I’ve been watching the news. I can say that I’m definitely alarmed by the post-911 surveillance on U.S. citizens.

PS: How do you feel about trading your personal information for online services?

OM: It depends on the service. We all have to give up something in order to use these tools. For example, without telling Google Maps that I’m interested in Mexican restaurants in Williamsburg, I might never find Taco Chulo. This continual paradox in making private information public is somewhat rendered void if the sites we use actually protect our information, but it is more likely that everything we say and do online is used to some degree to enhance and advertisements. Here’s another example, 97% of Google’s revenue comes from advertising, which should suggest that while they produce software, their ultimate goal is to appeal to advertisers.[1]

PS: What do you think the information gathered is used for?

OM: I have a background in interface design and development so I know how great it is to use web stats to see where users are clicking. If traffic is not moving in the direction that you want then you can make specific buttons more prevalent.

I can only imagine what a company like Google does with the data they gather through their analytics tools. The fact that a government could access this information is scary when you think of the actions of past fascist states. The amount of control a government could levy through a combination of deep packet searching and outrightly ignoring human rights is staggering.

PS: Have you ever been in a situation where sharing information online made you uncomfortable? If so, can you describe the situation?

OM: Definitely. Sharing financial information online always causes a little anxiety. One of my credit cards has been re-issued three times now due to “merchant databases being hacked.”

PS: What is the worst case scenario, and what impact would that have on an individual?

OM: I just moved to Berlin so I’m looking at the history of this place quite a bit. This is relevant because, during the Cold War, before Germany was reunited, the German Democratic Republic (GDR) Ministry for State Security (MfS) or ‘Stasi’ is believed to have hired, between spies and full- and part-time informants, one in every 6.5 East German citizens to report suspicious activities.[2] That’s millions of people. At this moment, the ratio of people entering data on Facebook to non-members is one in fourteen for the entire world.[3] We have probably the most effective surveillance machine in the history of mankind.

PS: Nowadays, most of the “reading” of what is written online is done by machines. Does this impact your idea of what is anonymity and privacy?

OM: Well, it’s not surprising the interview has come to this point, since I keep referrencing the multitude of methods of computer-controlled digital surveillance. It’s true that machines have replaced humans for remedial work. For example: searching text strings for suspicious statements. But the ultimate danger to my privacy is only enhanced by machines. The real problem is when companies that I trust with my data decide to share it with corporations or governments that engage in behavior control.

PS: Can a game raise issues such as online privacy? And if so, what would you like to see in such a game?

OM: I find this question to be intentionally leading. Perhaps its because I’m generally optimistic and come from farmers, so I assume anything is possible? Not being a gamer though, I can tell you honestly that yes, it is possible, but you will have some challenges if you intend to reach an audience that doesn’t already agree with you. Reaching non-gamers who don’t already feel the same will be even tougher.

Games are generally immersive; you are either playing or your not. The biggest challenge you may have is reaching non-gamers, because they don’t generally invest large amounts of time in games for enjoyment. Try to find ways to highlight complexity and prompt discussion regardless of how long users play, and make this clear from the outset.

Finally, in politically-motivated cultural production it’s important to appeal to an audience first, and let them come to the issues on their own. Who would sit through a film knowing the twist at the end? Especially a conclusion intended to spur critical thinking and action, which is of course the goal.

[1] “Google Financial Tables for Quarter ending June 30, 2009” Retrieved October 13, 2010
[2] Koehler, John O. (2000). Stasi: the untold story of the East German secret police. Westview Press. ISBN 0813337445.
[3] “Facebook Statistics” Retrieved October 14, 2010

Facebook’s recommended privacy settings should emphasize more not less

Thursday, October 14th, 2010

Facebook’s “Privacy Settings” always seem to be a work in progress. One thing they do consistently is default to less privacy overall, thus more sharing of your information on their site. For a website that depends on user-generated content the motivation to encourage sharing is clear enough. Still, why do they use the word “privacy” if they’re not actually embracing the idea?

For example, a recent update introduces a table with degrees of privacy from less to more (left to right). Types of data are listed in rows, while access is shown in the columns, with Everyone to Friends Only, again left to right.

fb_more

Curious about what Facebook “Recommended” settings were, I clicked and am sharing the screenshot below. I am not surprised to see that they wish me to open-up all content I generate; status messages, posts, images, etc. and discourage allowing anyone I don’t know to comment on posts (probably as spam prevention).

fb_less

I have been thinking about privacy quite a bit this week, developing ideas for what next to do with Give Me My Data, and providing an interview about social media for Naked on Pluto (along with the likes of Marc Garrett and Geert Lovink). Plus I went to see the “geek hero story” The Social Network at the Babylon Cinema last night.

Anyway, after all this thinking about Facebook’s past, I’m curious about its future, and how it will continue to try to hold on to the #1 social networking website position that Friendster and MySpace lost so quickly. The API, games, etc could be expected, but the Facebook Connect tools that are so prevalent now, even on Yelp, a site I figured could make it without schlepping, were a surprise.

Facebook Connect, a jquery “widget” that allows you to login to other websites using your Facebook ID, is clever and eerie at once. It allows Facebook to track you when you are not even on their site, and make sure you stay loyal. If that sounds sinister, well it is. What other purpose could there be for making available a service with the single purpose of mediating every interaction or bit of content you add to the web? It seems at first like OpenID, and it is, except that its run by a multi-billion dollar social media corporation.

Scrapy in process

Monday, September 27th, 2010

Picture 12

Picture 13

Picture 16