Setup Macports Python and Scrapy successfully

logo

“Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.”

But, it can be a little tricky to get running…

Attempting to install Scrapy on my MBP with the help of this post I kept running into errors with the libxml and libxslt libraries using the Scrapy documentation.

I wanted to try to let Macports manage all the libraries but I had trouble with it referencing the wrong installation of Python. I began with three installs:

  1. The default Apple Python 2.5.1 located at: /usr/bin/python
  2. A previous version I had installed located: /Library/Frameworks/Python.framework/Versions/2.7
  3. And a Macport version located: /opt/local/bin/python2.6

My trouble was that:

$ python

would always default to the 2.7 when I needed it to use the Macports version. The following did not help:

$ sudo python_select python26

I even removed the 2.7 version which caused only an error.

I figured out I needed to change the default path to the Macports version using the following:

$ PATH=$PATH\:/opt/local/bin ; export PATH

And then reinitiate the ports, etc.

Finally, I was not able to reference the scrapy-ctl.py file by default through these instructions so I had to reference the scrapy-ctl.py file directly

/opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py

UPDATE

A quick addendum to this post with instructions to create the link, found on the Scrapy site (#2 and #3).

Starting with #2, “Add Scrapy to your Python Path”

sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy

And #3, “Make the scrapy command available”

sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py /usr/local/bin/scrapy

-->