“Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.”
But, it can be a little tricky to get running…
I wanted to try to let Macports manage all the libraries but I had trouble with it referencing the wrong installation of Python. I began with three installs:
- The default Apple Python 2.5.1 located at: /usr/bin/python
- A previous version I had installed located: /Library/Frameworks/Python.framework/Versions/2.7
- And a Macport version located: /opt/local/bin/python2.6
My trouble was that:
would always default to the 2.7 when I needed it to use the Macports version. The following did not help:
$ sudo python_select python26
I even removed the 2.7 version which caused only an error.
I figured out I needed to change the default path to the Macports version using the following:
$ PATH=$PATH\:/opt/local/bin ; export PATH
And then reinitiate the ports, etc.
Finally, I was not able to reference the scrapy-ctl.py file by default through these instructions so I had to reference the scrapy-ctl.py file directly
A quick addendum to this post with instructions to create the link, found on the Scrapy site (#2 and #3).
Starting with #2, “Add Scrapy to your Python Path”
sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy
And #3, “Make the scrapy command available”
sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin/scrapy-ctl.py /usr/local/bin/scrapy