How to install Scrapy with MacPorts (full version)

macports-logo-top + logo

Here is a step-by-step explaining how I got Scrapy running on my MacBook Pro 10.5 using MacPorts to install Python and all required libraries (libxml2, libxsit, etc.). The following has been tested on two separate machines with Scrapy .10.

Many thanks to users here who shared some helpful amendments to the default installation guide. My original intention was to post this at stackoverflow, but their instructions discourage posting issues that have already been answered so here it is…

1. Install Xcode with options for command line development (a.k.a. “Unix Development”). This requires a free registration.

2. Install MacPorts

3. Confirm and update MacPorts

$ sudo port -v selfupdate

4. “Add the following to /opt/local/etc/macports/variants.conf to prevent downloading the entire unix library with the next commands”

+bash_completion +quartz +ssl +no_x11 +no_neon +no_tkinter +universal +libyaml -scientific

5. Install Python

$ sudo port install python26

If for any reason you forgot to add the above exceptions, then cancel the install and do a “clean” to delete all the intermediary files MacPorts created. Then edit the variants.conf file (above) and install Python.

$ sudo port clean python26

6. Change the reference to the new Python installation

If you type the following you will see a reference to the default installation of Python on MacOS 10.5 (Python2.5).

$ which python

You should see this

/usr/bin/python

To change this reference to the MacPorts installation, first install python_select

$ sudo port install python_select

Then use python_select to change the $ python reference to the Python version installed above.

$ sudo python_select python26

UPDATE 2011-12-07: python_select has been replaced by port select so…

To see the possible pythons run

port select --list python

From that list choose the one you want and change to it e.g.

sudo port select --set python python26

Now if you type

$ which python

You should see

/opt/local/bin/python

which is a symlink to

/opt/local/bin/python2.6

Typing the below will now launch the Python2.6 shell editor (ctl + d to exit)

$ python

7. Install required libraries for Scrapy

$ sudo port install py26-libxml2 py26-twisted py26-openssl py26-simplejson

Other posts recommended installing py26-setuptools but it kept returning with with errors, so I skipped it.

8. “Test that the correct architectures are present:

$ file `which python`

The single quotes should be backticks, which should spit out (for intel macs running 10.5):”

/opt/local/bin/python: Mach-O universal binary with 2 architectures
/opt/local/bin/python (for architecture i386): Mach-O executable i386
/opt/local/bin/python (for architecture ppc7400): Mach-O executable ppc

9. Confirm libxml2 library is installed (those really are single quotes). If there are no errors it imported successfully.

$ python -c 'import libxml2'

10. Install Scrapy

$ sudo /opt/local/bin/easy_install-2.6 scrapy

11. Make the scrapy command available in the shell

$ sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy /usr/local/bin/scrapy

One caveat for the above, on a fresh computer, you might not have a /usr/local/bin directory so you will need to create it before you can run the above to create the symlink.

$ sudo mkdir /usr/local/bin

13. Finally, type either of the following to confirm that Scrapy is indeed running on your system.

$ python scrapy

$ scrapy

A final final bit… I also installed ipython from Macports for use with Scrapy

sudo port install py26-ipython

Make a symbolic link
sudo ln -s /opt/local/bin/ipython-2.6 /usr/local/bin/ipython

An article on ipython
http://onlamp.com/pub/a/python/2005/01/27/ipython.html

ipython tutorial
http://ipython.scipy.org/doc/manual/html/interactive/tutorial.html

Was weiß Facebook über mich?, in Bild

Picture 7

The German newspaper, Bild published another article mentioning Give Me My Data today.

Was weiß Facebook über mich? or in English, Facebook knows what about me?

“About 500 million people worldwide use the social network Facebook to stay in touch with friends. In Germany, almost 9.8 million people are registered with Facebook. Many users are worried about their privacy. BILD.de answered the important questions…”

Setup Macports Python and Scrapy successfully

logo

“Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.”

But, it can be a little tricky to get running…

Attempting to install Scrapy on my MBP with the help of this post I kept running into errors with the libxml and libxslt libraries using the Scrapy documentation.

I wanted to try to let Macports manage all the libraries but I had trouble with it referencing the wrong installation of Python. I began with three installs:

  1. The default Apple Python 2.5.1 located at: /usr/bin/python
  2. A previous version I had installed located: /Library/Frameworks/Python.framework/Versions/2.7
  3. And a Macport version located: /opt/local/bin/python2.6

My trouble was that:

$ python

would always default to the 2.7 when I needed it to use the Macports version. The following did not help:

$ sudo python_select python26

I even removed the 2.7 version which caused only an error.

I figured out I needed to change the default path to the Macports version using the following:

$ PATH=$PATH\:/opt/local/bin ; export PATH

And then reinitiate the ports, etc.

Finally, I was not able to reference the scrapy-ctl.py file by default through these instructions so I had to reference the scrapy-ctl.py file directly

/opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py

UPDATE

A quick addendum to this post with instructions to create the link, found on the Scrapy site (#2 and #3).

Starting with #2, “Add Scrapy to your Python Path”

sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy

And #3, “Make the scrapy command available”

sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py /usr/local/bin/scrapy

How to easily set up a campaign finance database (well, kind of) or Make Python work with MAMP via MySQLdb

Picture 15

I’ve been trying for a few hours to run a Python script from The Sunlight Foundation Labs which downloads (and updates) a campaign finance database from the Center for Responsive Politics. See their original post for more information.

In the process of getting this working I accidentally broke a working copy of MySQL and overwrote a database installed on my MBP (which I had stupidly not backed-up since last year). FYI, you can rebuild any MySQL database with the original .frm, .MYD, and .MYI files if you 1. Recreate the database in the new install of MySQL and 2. Drag the files into the mysql data folder.

I struggled quite a bit getting Python to work with MySQL via MySQLdb. I’m documenting some of the headaches and resolutions here in case they are useful. I’ve tried to include error messages for searches as well.

The Sunlight Foundation instructions require Python and MySQL, but don’t mention you have to have already wrestled with the madness involved in installing Django on your machine. Here is what I did to get it working on my MacBook Pro Intel Core 2 Duo. I’ve included their original instructions with my own (and a host of others).

Instructions

  1. Install MAMP.

    While I had working installations of MySQL and Python (via installers on respective sites), I couldn’t get Python to connect to MySQL via MySQLdb. I decided to download and try MAMP for a clean start.

  2. Install XCode

    Past installs are available on Apple Developer website.

  3. Install setuptools

    Required for the MySQLdb driver. Remove the .sh extension from the filename (setuptools-0.6c11-py2.7.egg.sh) and in a shell:

    ~$ chmod +x setuptools-0.6c11-py2.7.egg
    ~$ ./setuptools-0.6c11-py2.7.egg

  4. Install the MySQLdb driver

    After downloading and unzipping, from the directory:

    ~$ python setup.py build
    ~$ sudo python setup.py install

    Continue following the advice of this post to the end How to install Django with MySQL on Mac OS X.

    I also followed another piece of advice in Python MySQL on a Mac with MAMP to change the mysql_config.path from:

    /usr/local/mysql/bin/mysql_config

    to

    /Applications/MAMP/Library/bin/mysql_config

    Especially useful is his test script for making sure that Python is indeed accessing MySQL.

  5. Create a symbolic link between Python and MySQL in MAMP

    This is required in order to use a socket to connect to the MySQL. See How to install MySQLdb on Leopard with MAMP for more information.

    ~$ sudo ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock

  6. Create a directory and put the two Python files in it.
  7. Modify the top of the sun_crp.py file to set certain parameters–your login credentials for the CRP download site and your MySQL database information.
  8. Install pyExcelerator

    Error: ImportError: No module named pyExcelerator

    I had to install this module next.

  9. Comment out multiple lines

    Error: NameError: name 'BaseCommand' is not defined

    In download.py comment out the following:

    The line: from django.core.management.base import BaseCommand, CommandError

    Everything from class CRPDownloadCommand(BaseCommand): to the end of the document.

  10. From the command line, run the script by typing, from the proper directory: Python sun-crp.py.
  11. It will take several hours to download and extract the data, especially the first time it’s run. But after that, you’re good to go.

Automata: Counter-Surveillance in Public Space paper on the Public Interventions panel at ISEA2010

isea2010_logo_klein

ISEA2010 RUHR Conference in Dortmund, Germany

P26 Public Interventions
Tue 24 August 2010
15:00–16:30h
Volkshochschule Dortmund, S 137a
Moderated by Georg Dietzler (de)

  • 15:00h | Owen Mundy (us): Automata: Counter-Surveillance in Public Space
  • 15:20h | Christoph Brunner (ch/ca), Jonas Fritsch (dk): Balloons, Sweat and Technologies. Urban Interventions through Ephemeral Architectures
  • 15:40h | Georg Klein (de): Don’t Call It Art! On Artistic Strategies and Political Implications of Media Art in Public Space
  • 16:00h | Georg Dietzler (de): Radical Ecological Art and No Greenwash Exhibitions

About my talk:

Automata is the working title for a counter-surveillance internet bot that will record and display the mutually-beneficial interrelationships between institutions for higher learning, the global defense industry, and world militaries. Give Me My Data is a Facbook application that help users reclaim and reuse their Facebook data. The two projects, both ongoing, address important issues surounding contemporary forms of communication, surveillance, and control.

Recent and ongoing projects

Howdy, it’s been awhile since I last shared news about recent and ongoing projects. Here goes.

close-your-eyes-ac-direct-me-14_1000h

1. You Never Close Your Eyes Anymore

You Never Close Your Eyes Anymore is an installation that projects moving US Geological Survey (USGS) satellite images using handmade kinetic projection devices.

Each device hangs from the ceiling and uses electronic components to rotate strips of satellite images on transparency in front of an LED light source. They are constructed with found materials like camera lenses and consumer by-products and mimic remote sensing devices, bomb sights, and cameras in Unmanned Aerial Vehicles.

The installation includes altered images from various forms of lens-based analysis on a micro and macro scale; land masses, ice sheets, and images of retinas, printed on reflective silver film.

On display now until July 31 at AC Institute 547 W. 27th St, 5th Floor
Hours: Wed., Fri. & Sat.: 1-6pm, Thurs.: 1-8pm

New video by Asa Gauen and images
http://owenmundy.com/site/close_your_eyes

2. Images and video documentation of You Never Close Your Eyes Anymore will also be included in an upcoming Routledge publication and website:

Reframing Photography: Theory and Practice
by Rebekah Modrak, Bill Anthes
ISBN: 978-0-415-77920-3
Publish Date: November 16th 2010
http://www.routledge.com/books/details/9780415779203/

gmmdlogo

3. Give Me My Data launch

Give Me My Data is a Facebook application designed to give users the ability to export their data out of Facebook for any purpose they see fit. This could include making artwork, archiving and deleting your account, or circumventing the interface Facebook provides. Data can be exported in CSV, XML, and other common formats. Give Me My Data is currently in public-beta.

Website
http://givememydata.com/

Facebook application
http://apps.facebook.com/give_me_my_data/

logo_nyt

4. Give Me My Data was also covered recently by the New York Times, BBC, TechCrunch, and others:

Facebook App Brings Back Data by Riva Richmond, New York Times, May 1, 2010
http://gadgetwise.blogs.nytimes.com/2010/05/01/facebook-app-brings-back-data/

Picture 6

5. yourarthere.net launch

A major server and website upgrade to the yourarthere.net web-hosting co-op for artists and creatives. The new site allows members of the community to create profiles and post images, tags, biography, and events. In addition to the community aspect, yourarthere.net is still the best deal going for hosting your artist website.

Website
http://yourarthere.net

More images
http://owenmundy.com/site/design_yourarthere_net

americans_nwfsc_0033_1000w

6. The Americans

The Americans is currently on view at the Northwest Florida State College in Niceville, FL. It features a new work with the same title.

More images
http://owenmundy.com/site/the-americans

bb101_schematic_oblique

7. Your Art Here billboard hanger

I recently designed a new billboard hanging device and installed it in downtown Bloomington, IN with the help of my brother Reed, and wife Joelle Dietrick.

Stay tuned here for news about Your Art Here and the new billboard by Joelle Dietrick.
http://www.facebook.com/pages/Your-Art-Here/112561318756736

lockheedmartin.com_sitemap_20091214_red_800w

8. Finally, moving to Berlin for a year on a DAAD fellowship to work on some ongoing projects, including Automata.

More images
https://owenmundy.com/blog/2010/07/new-automata-sitemaps/

I’ll be giving a paper about Automata at the upcoming ISEA2010 conference in Ruhr, Germany.
http://www.isea2010ruhr.org/conference/tuesday-24-august-2010-dortmund

Many thanks to Chris Csikszentmihályi, Director of the Center for Future Civic Media http://civic.mit.edu/ , for inviting me to the MIT Media Lab last August to discuss the project with his Computing Culture Group: http://compcult.wordpress.com/

You Never Close Your Eyes Anymore @ AC Direct

close-your-eyes-ac-direct-me-19_1000h

You Never Close Your Eyes Anymore opens tonight at AC Institute in Chelsea.

July 1 – July 31, 2010
Opening: Thursday, July 1, 2010 6-8pm

AC Institute [Direct Chapel]
547 W. 27th St, 5th Floor
New York, NY

Gallery Hours: Wed., Fri. & Sat.: 1-6pm, Thurs.: 1-8pm

close-your-eyes-ac-direct-me-14_1000h

close-your-eyes-ac-direct-me-23_compTop_1000w_cropped

close-your-eyes-ac-direct-me-24_compBot_1000w

-->