Mechanize and BeautifulSoup are two essential modules for data acquisition.
However, Mechanize is only available on Python 2. But there's a way to use it with Python 3. I'll show you one solution.

If you’re using Python 3 and you want to use the module Mechanize to navigate through web forms, you’ll get this error :

Traceback (most recent call last):
  File “/Users/michaelcaraccio/PycharmProjects/WebScraping/test.py”, line 3, in  import mechanize
  File “/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/mechanize/__init__.py”, line 119, in 
    from _version import __version__
ImportError: No module named ‘_version’

Unfortunately, Mechanize is incompatible with Python 3 : Support Python 3 #96.

But there's another way to make it works. You'll see it later.

Python 2 - Code example

Before giving you the answer, let’s see a working example, using BeautifulSoup and Mechanize. The following code describes how to connect your Twitter account and check if you’re connected :

import mechanize
from bs4 import BeautifulSoup

if __name__ == “__main__”:
    URL = “https://twitter.com/login”
    LOGIN = “yourlogin” # email login
    PASSWORD = “yourpassword”
    TWITTER_NAME = “twittername” # without @

    # Create a browser object
    browser = mechanize.Browser()
    browser.set_handle_robots(False) # no robots
    browser.addheaders = [(‘User-agent’, ‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36’)]

    # Open webpage
    browser.open(URL)

    # Select the form
    browser.select_form(nr = 1)
    browser.form[‘session[username_or_email]’] = LOGIN
    browser.form[‘session[password]’] = PASSWORD
    response = browser.submit()

    # Get response
    userPage = BeautifulSoup(response, ‘html.parser’)
    user = userPage.find(“a”, { “class” : “u-linkComplex” }).string

    # Check if connected
    if user == TWITTER_NAME:
        print(“You’re connected as “ + user)
    else:
        print(“You’re not connected”)

If you want to try this code, change the following variables :

  • LOGIN = “yourlogin” # email login
  • PASSWORD = “yourpassword”
  • TWITTER_NAME = “twittername” # without @

Python 3 - Solution

MechanicalSoup

As I said, Mechanize seems to be not maintained anymore. After some research I found this Module : MechanicalSoup

MechanicalSoup merged Mechanical and BeautifulSoup in the same Library and can be used since Python 2.6 through 3.4.

GitHub : MechanicalSoup.

Installation

With pip:

pip install MechanicalSoup

Or if you’re using PyCharm :
Preferences —> Project Interpreter —> Select your project —> Click on the + button —> Search MechanizeSoup and install it

Fix previous code - Python 2 —> Python 3

After fixing my code with MechanizeSoup :

import mechanicalsoup # Don’t forget to import the new module

if __name__ == “__main__”:

    URL = “https://twitter.com/login”
    LOGIN = “yourlogin”
    PASSWORD = “yourpassword”
    TWITTER_NAME = “twittername” # without @

    # Create a browser object
    browser = mechanicalsoup.Browser()

    # request Twitter login page
    login_page = browser.get(URL)

    # we grab the login form
    login_form = login_page.soup.find(“form”, {“class”:”signin”})

    # find login and password inputs
    login_form.find(“input”, {“name”: “session[username_or_email]”})[“value”] = LOGIN
    login_form.find(“input”, {“name”: “session[password]”})[“value”] = PASSWORD

    # submit form
    response = browser.submit(login_form, login_page.url)

    # verify we are now logged in ( get username in webpage )
    user = response.soup.find("a", class_="u-linkComplex").text

    if TWITTER_NAME in user:
        print(“You’re connected as “ + TWITTER_NAME)
    else:
        print(“Not connected”)

That’s it! Now you can log in to the website you want and start scraping!



Image credit

Green Tree Python by Ian C is licensed under CC BY-SA 2.0