This is an archived course. A more recent version may be available at ocw.mit.edu.

Study Materials

Lecture 8 Motif Finding Animation

Gibbs Sampler, Weak Motif Animation (AVI) courtesy of Professor Chris Burge.

Python Programming

The majority of the homework assignments will include problems that involve writing simple programs in the scripting language, Python. Python, as well as Perl, is widely used in the fields of bioinformatics and computational biology. Because many students may have little or no programming experience, a hands-on python tutorial to take place across three sessions will be offered by Dr. Peter Woolf during the second week of classes.

Python Tutorial Overview

The aim of this tutorial is to give students a basic working knowledge of the scripting language Python. This course is intended for students with little or no programming experience, and will focus on the tools and utilities needed to do research in bioinformatics and computational biology.

My goal is to make the class informal and hands on, so please speak up if something does not make sense. Programming is not something that can be easily learned by watching, but must be learned by doing.

At minimum, by the end of this class, you should be able to read in a FASTA sequence from a file, parse it, and return the reverse compliment of that sequence to a file.

Tutorial Outline

Session One: Introduction to Unix, Text Editors, Basic Python Commands and Data Structures

Session Two: Flow Control in Python, Input/Output, Files, HTML

Session Three: Modules, Program Organization, and Regular Expressions

Text

Buy at Amazon Lutz, Mark, and David Ascher. Learning Python. 2nd ed. Beijing; Cambridge, MA: O'Reilly, 2003. ISBN: 9780596002817.

Online Resources

The tutorial will roughly follow the structure of the standard documentation tutorial that can be found at: Online Python Tutorial.

If you are already a proficient programmer, look at: Dive into Python.

A good Unix-command Cheat Sheet can be found at: Unix-command Cheat Sheet.

For an introduction to regular expressions:  Regular Expression HOWTO.

To quickly test your regular expressions, try the program: Kodos.

Finally, for lots of examples of good Python code related to Bioinformatics and Computational Biology, see: Biopython Web site.

In Class Exercise for Session Two

Review the notes on Unix Commands and Beginner's Python (PDF).

  • Parse the string in fasta.txt (TXT) to obtain the reverse compliment of the sequence section alone. Output this new string to a file called output.txt.

In Class Exercises for Session Three

In Python you can write programs that can run as a stand alone program or you can import them into other Python code. In fact, you have already been using Python programs every time you use an import command.

As an example of the framework of a basic Python program, see SampleProg.py (PY).

  • Load SampleProg.py from within the Python Interpreter and test the functions read_format and do_comparison.
  • Modify SampleProg.py so that it compares two numbers given at command line using do_comparison.

Regular expressions are a powerful text parsing tool that is widely used in bioinformatics. See the notes on regular expressions (PDF) for a summary of the commands.

  • Write a regular expression to extract all of the carbon atom position data from the file example.pdb (PDB). Print this data out.