Python 3 - Mechanize and BeautifulSoup
2 min read

Python 3 - Mechanize and BeautifulSoup

Python 3 - Mechanize and BeautifulSoup

[Updated December 2019]

Mechanize and BeautifulSoup are two essential modules for data acquisition.
However, Mechanize is only available on Python 2. But there's a way to use it with Python 3. I'll show you one solution.

If you’re using Python 3 and you want to use the module Mechanize to navigate through web forms, you’ll get this error :

Traceback (most recent call last):
  File “/Users/michaelcaraccio/PycharmProjects/WebScraping/test.py”, line 3, in  import mechanize
  File “/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/mechanize/__init__.py”, line 119, in 
    from _version import __version__
ImportError: No module named ‘_version’

Unfortunately, Mechanize is incompatible with Python 3 : Support Python 3 #96.

But there's another way to make it works. You'll see it later.

Python 2 - Code example

Before giving you the answer, let’s see a working example, using BeautifulSoup and Mechanize. The following code describes how to connect your Twitter account and check if you’re connected :

import mechanize
from bs4 import BeautifulSoup

if __name__ == “__main__”:
    URL = “https://twitter.com/login”
    LOGIN = “yourlogin” # email login
    PASSWORD = “yourpassword”
    TWITTER_NAME = “twittername” # without @

    # Create a browser object
    browser = mechanize.Browser()
    browser.set_handle_robots(False) # no robots
    browser.addheaders = [(‘User-agent’, ‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36’)]

    # Open webpage
    browser.open(URL)

    # Select the form
    browser.select_form(nr = 1)
    browser.form[‘session[username_or_email]’] = LOGIN
    browser.form[‘session[password]’] = PASSWORD
    response = browser.submit()

    # Get response
    userPage = BeautifulSoup(response, ‘html.parser’)
    user = userPage.find(“a”, { “class” : “u-linkComplex” }).string

    # Check if connected
    if user == TWITTER_NAME:
        print(“You’re connected as “ + user)
    else:
        print(“You’re not connected”)

If you want to try this code, change the following variables :

  • LOGIN = “yourlogin” # email login
  • PASSWORD = “yourpassword”
  • TWITTER_NAME = “twittername” # without @

Python 3 - Solution

MechanicalSoup

As I said, Mechanize seems to be not maintained anymore. After some research I found this Module : MechanicalSoup

MechanicalSoup merged Mechanical and BeautifulSoup in the same Library and can be used since Python 2.6 through 3.4.

GitHub : MechanicalSoup.

Installation

With pip:

pip install MechanicalSoup

Or if you’re using PyCharm :
Preferences —> Project Interpreter —> Select your project —> Click on the + button —> Search MechanicalSoup and install it

Python 3 example (Updated december 2019)

After fixing my code with MechanicalSoup :

import mechanicalsoup # Don’t forget to import the new module

if __name__ == "__main__":

    URL = "https://twitter.com/login"
    LOGIN = "your_login"
    PASSWORD = "your_password"
    TWITTER_NAME = "displayed_name" # Displayed username on Twitter

    # Create a browser object
    browser = mechanicalsoup.StatefulBrowser()

    # request Twitter login page
    browser.open(URL)

    # we grab the login form
    browser.select_form('form[action="https://twitter.com/sessions"]')

    # print form inputs
    browser.get_current_form().print_summary()

    # specify username and password
    browser["session[username_or_email]"] = LOGIN
    browser["session[password]"] = PASSWORD

    # submit form
    response = browser.submit_selected()

    # get current page output
    response_after_login = browser.get_current_page()

    # verify we are now logged in ( get img alt element containing username )
    # if you found a better way to check, let me know. Since twitter generate dynamically all theirs classes, its
    # pretty complicated to get better information
    user_element = response_after_login.select_one("img[alt="+TWITTER_NAME+"]")

    # if username is in the img field, it means the user is successfully connected
    if TWITTER_NAME in str(user_element):
        print("You're connected as " + TWITTER_NAME)
    else:
        print("Not connected")


In this example I use StatefulBrowser() instead of Browser() to get the Javascript redirection from Twitter login page.

That’s it! Now you can log in to the website you want and start scraping!



Image credit

Green Tree Python by Ian C is licensed under CC BY-SA 2.0