Configuring Selenium WebDriver for Python on Cygwin

February 27, 2016

Until now we have mostly focused on UNIX based systems to elaborate the content in our tutorials. Truth be told, many of the customers requesting our scraping services rely on Windows machines to carry out their tasks.

Our work at 6020 peaks does not end by just delivering a script and letting our customer figuring out his own way to run it. Far away from that, we try to help our customer deploying the crafted piece of software on his own systems. That’s the level of compromise we like in order to guarantee a happy and healthy relationship with our customers.

In a previous post, we addressed the topic of using Selenium for scraping Web data that is hidden behind AJAX interaction. In that tutorial, we relied on Python to elaborate our scraper. This means that in order to run the resulting script, the customer will need to have a Python interpreter and all the required libraries installed in his own machine.

A very straightforward way of setting up a Python environment in Windows is by using Cygwin, which provides many of the available command line tools existing in Linux. You can install Cygwin from here [http://cygwin.com/install.html] by choosing the 32 bits or 64 bits version, according to your system.

During the Cygwin installation you can choose which packages or tools you want to install (see screenshot below).

cywing-python

For this tutorial, we need to install the following packages:

  • Python interpreter (we use version 2.7.10).
  • wget for downloading files via HTTP and.
  • vim for editing files from the command line.

When the installation wizard has finished, just open a Cygwin shell and make sure that the python interpreter is available. For doing so just run the following command:

$ python --version

This should display the version you installed. Something like Python 2.7.10.

Once our Python interpreter is available, we will install a Python package manager, namely pip, for handling the libraries required by our script. In order to install pip just run the following commands from Cygwin:

$ wget https://bootstrap.pypa.io/get-pip.py
$ python get-pip.py

Afterwards you are ready to install your Python dependencies. As we are going to use Selenium in our script we need to have this package in our system. You can install it using pip now by running:

$ pip install selenium

A great functionality provided by pip is that you can save all the existing python dependencies in your system to a file:

$ pip freeze > requirements.txt

Having this file you can install all those dependencies in a new python environment:

$ pip install -r requirements.txt

Now, once the Python interpreter and the required dependencies are available in our environment, we can focus on using Selenium from Cygwin.

As you know, the Selenium WebDriver works with Mozilla Firefox without requiring any kind of extra driver, which is the case for Google Chrome and Internet Explorer. However, when using Cygwin your python environment won’t be able to find the Firefox executable in the path by default, which will obviously lead to an error.

Assuming you already have Firefox installed in your system (if not just do it now from here), you can fix this issue by adding the firefox.exe route to the Cygwin PATH variable.

First check where exactly in your system resides the firefox.exe file. If you are using Windows 7 this can be normally found here:

C:/Program Files (x86)/Mozilla Firefox/

This Windows path can be accessed from Cygwin through the following path:

/cygdrive/c/Program\ Files\ (x86)/Mozilla\ Firefox/

This is the path we need to append to our PATH variable. For doing so, run the following command from a Cygwin shell:

$ vim .bash_profile

Go to the end of the file (press shift+G) and type the following (press i to enter in edit mode):

PATH=$PATH:/cygdrive/c/Program\ Files\ \(x86\)/Mozilla\ Firefox/
export PATH

Press Esc for exiting the edit mode in vim and then type “:wq” to save your changes. This will return to your Cygwin shell prompt. Now run the following to refresh your shell and get the changes from your path:

$ source .bash_profile

If you now execute $ echo $PATH you should see the Firefox path at the end.

In the next step, we have to modify our python script so that it is able to find the right Cygwin path to Firefox. Before any changes, our code usually looks like this and works perfectly outside Cygwin:

class Spider():
    def __init__(self):
        self.driver = webdriver.Firefox()
        # From this point we can use the WebDriver API

To adapt this code to Cygwin we will create a Firefox profile to initialise our WebDriver instance. For doing so we will add the following class before our existing Spider class. Pay attention to the additional imports that are required:

from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from subprocess import Popen, PIPE

class CygwinFirefoxProfile(FirefoxProfile):
    @property
    def path(self):
        path = self.profile_dir
        try:
            proc = Popen(['cygpath','-d',path], stdout=PIPE, stderr=PIPE)
            stdout, stderr = proc.communicate()
            path = stdout.split('\n', 1)[0]
        except OSError:
            print("No cygwin path found")
        return path

Note that this profile is compatible with non Cygwin environment as well, which is great if we still want to run our script in a Linux or Mac OS X machine without making any changes (kudos to the StackOverflow user of this question).

Next, modify the Spider class and pass a reference to the Firefox profile to the WebDriver constructor:

class Spider():
    def __init__(self):
        firefoxProfile = CygwinFirefoxProfile()
        self.driver = webdriver.Firefox(firefoxProfile)

Finally, if you want to run the script from Cygwin, just copy the python file to a folder accessible by Cygwin, e.g. your home folder. This one resides normally in C:\cygwin\home{your-user-name}. Once there and assuming your Cygwin current directory corresponds to your home directory, run your script by doing:

$ python your-python-crawler.py

This will start a new Firefox window and run the interaction you implemented in your python crawler script.

We hope this guide serves as a reference to our customers and gives you an idea of how easily you can offer a better service to your own customers with very little effort. If you use any other solution, please let us know in the comments.