Programming Blackberry

CyanogenMod 11 on Verizon Samsung Note 2

It's been almost four years since I've written to this blog and many things have changed. First, the blog should no longer be called "Programming Blackberry" because I actually gave that up a long time back for Android and iOS. I think most people would agree I made the right choice at the right time. I'm going to be moving this blog to a self-hosted site and while I work out the technical aspects of that move (and hopefully saving all of the articles and great comments I've gotten) I figured I'd document something I struggled a little bit with over the last day, namely getting my Note 2 flashed with CyanogenMod 11.

It wasn't a straightforward install since I had a different version of stock Samsung Android than required and I ran into an issue that took some tricks to get it working.

I will give credit where it's due, without these sites I would not have been able to accomplish this. Additionally, there are a number of developers out there such as Adam Outler who have written exploits and tools without which we wouldn't have the ability to break free from the Samsung's, HTC's, and LG's of the world and get the latest Android on our devices.

Note: My Note 2 was always stock Samsung/Verizon version of Android and was never flashed with any other ROM nor did I accept any OTA upgrades from 4.1.1. Follow this guide step-by-step only if you're currently in the same boat as me. If you have upgraded there's still hope. You can check some of the linked sites below to see if your version will work with the Odin3 tool.

I actually used both a Windows and Mac machines to get the job done. It is possible that it can all be done on one or the other however I used the Windows version of Odin upload files to the device. JOdin3 apparently will work on Mac but I did not see it existed until after the job was done.

Lastly, I don't claim to be an expert in custom ROMs and as a result there may be many other superior ways to get CM on your Note 2. All I can say is that this is the method that worked for me and as a result it may help you as well.

Please do refer to these sites if you run into problems with what i've written up; these other sites have extensive troubleshooting guides as well as hundreds of pages of user comments that you can browse.

Revert Android to the VRALJB Version

It is important to note that the exploits linked to here will only work on certain releases of Android. If you check your device and see that you don't have one of the versions listed you may stop there in your pursuit of Cyanogen or other releases but DON'T! Thankfully with Samsung devices someone wrote the Odin tool that will allow us to replace whichever version currently installed (EXCEPT 4.3 OTA) with one that's suitable for rooting and customizing (of course for other brands there are similar such tools so don't lose hope). It all depends on which OTA update you've taken for Step 3 below, in my case I hadn't upgraded in awhile and didn't accept the VRALL4 or VRAMC3 OTA updates so I downloaded the stock version indicated. I didn't document which version I had but I believe it was I605VRBLJ1. Check section 1b on this site before downloading for Step 3. Using this strategy, I did the following:

Step 1: Download and install Odin for Windows here
Step 2: Download SCH-I605_16gb Pit file here
Step 3: Download VRALJB stock version of Android here
Step 4: Set device into Odin mode by pressing down volume, home button, and power button at same time. You'll get a warning screen, press volume up and the device will be set in Odin mode, ready to accept the stock VRALJB version.
Step 5: Assuming there are no USB driver issues connecting your device to the Windows machine, connect the Note 2 to the Windows machine and open Odin3. Make sure that in the ID:COM section there is a COM port listed indicating Odin can see the Note 2. See pics below:

Step 6: Select the Pit button and point to the pit download from Step 2, then the PDA button and point to the VRALJB version downloaded previously.

Step 7: Hit start in Odin3 and after a few minutes the VRALJB version will be flashed. Progress can be seen on the blue bar at the bottom of the "Downloading" screen on the Note2.

Rooting and Bootloading: ClockworkMod vs TeamWinRecovery

guide

here

Note:

Step 1: Download the following files:

VRALJB Pit File: Used to prep for unlock.
Baseline Bootloader: Stock bootloader and kernel
CASUAL Injection 9 : CASUAL leverages adb to automate the specific Verizon exploit
Bootloader 2:Validates and executes the kernel, runs every time at startup

Step 2: Install Baseline Bootloader

Place device into Odin mode by hitting volume down, home key, and power button. When warning screen comes up, hit volume up. Connect device to Windows laptop and launch Odin.
Select the Pit button and point to the VRALJB pit file downloaded above
Select the bootloader button and point to the Baseline bootloader downloaded above (BootloaderBaseline2.tar.md5.gz)
Select start, wait until complete and reboot device

Step 3: Run the CASUAL Exploit

Reconnect device to windows laptop and and double-click the CASUAL jar file downloaded above
Select the Pit button and point to the VRALJB pit file downloaded above
Wait until device is recognized by CASUAL - You'll know when the "Root and Unlock Me Now" button is enabled.(make sure your USB debugging is enabled on the device)
Select the "Root and Unlock Me Now" button and wait a little while, press Volume Up when prompted on the device and reboot.

Step 4:Install Bootloader

Put phone back into Odin mode.
Select the Bootloader button and point to the SuckItVerizon_Odin_Package2.tar.md5.gz file downloaded above. Again, this is the step where you could instead install CWM as described above.
Select start and wait until complete. Disconnect and reboot

Installing the non-working TWRP bootloader is ok since you can actually easily replace it with CWM using Heimdall tools which I'll document in the next step. You can confirm you've got root access to the Note 2 at this point by attaching to adb, running 'adb shell' and trying to 'su root'. The Heimdall tool is a command line program that will allow us to flash firmware onto Samsung specific devices.

Replacing TWRP Bootloader with CWM

We must replace the TWRP bootloader with the CWM bootloader and we'll use the Heimdall tools to do it. I did this on a Mac but the tools should work on Windows or Linux just as well. In my case I had an issue with blocking on the USB port so I couldn't run Heimdall. I determined that the errors encountered are specific to Mac with Samsung drivers installed and they should be unloaded to run the toolset.

Step 1: You can find the toolset here. Download and install the tools, open a terminal window and test 'heimdall' at the command line. If you get a usage script you're good to go.
Step 2: Download the ClockworkMod here
Step 2: Connect your Note 2 to the computer and run the following command 'heimdall flash --RECOVERY /recovery-clockwork-touch- --no-reboot' where the path and version is specific to what you downloaded and where you placed it. In my case the command was specifically 'heimdall flash --RECOVERY ~/Desktop/recovery-clockwork-touch-6.0.4.3-i605-2.img --no-reboot'

Step 3:If you get no errors you'll see 'RECOVERY upload successful' however in my case I needed to unload the Samsung driver that was taking over the USB port. I received the error 'libusbx: error [darwin_claim_interface] USBInterfaceOpen: another process has device opened for exclusive access'.
Run the following set of commands from Terminal:

'sudo kextunload -b com.devguru.driver.SamsungComposite;'
'sudo kextunload -b com.devguru.driver.SamsungACMData;'
'sudo kextunload -b com.devguru.driver.SamsungACMControl'

Step 4:Now try the same heimdall command again 'heimdall flash --RECOVERY /recovery-clockwork-touch- --no-reboot' and you should get a successful result.

You've now got CWM Recovery installed and you are ready to get up and running on CyanogenMod.

Install CyanogenMod 10/11

CyanogenMod for the Note 2 on Verizon can be found here. Now I actually attempted to go straight to the latest nightly version of CM11 and found that it too couldn't be installed even with CWM. Not wanting to give up since I'd invested so much time thus far, I tried to see if I could go with CM10 then upgrade it with the CM update system and.....SUCCESS!!! So here's what you do:

Step 1: Download Stable version of CM10 here
Step 2: Download the Jellybean version of Gapps found here and the KitKat version of Gapps found here The reason we need both versions is that the CyanogenMod stock launcher for CM 11, Trebuchet, crashed repeatedly for me so we are going to use the Play app to download a set of different launchers before installing CM 11. However, it became clear to me in researching that the KK version of Gapps won't work with JB and vice-versa. You could try to use it without replacing in a later step but do at your own risk (of time wasted...or not wasted)
Step 3: Burn all the files to a Micro SD card and place in the Note 2
Step 4: Boot into CWM by holding the up volume key, home key and power key.
Step 5: Clear the cache partition and do a wipe data/factory reset using the touch interface of CWM
Step 6: Install CM 10.1.3 by selecting 'install zip' then 'install zip from /external_sd'. Select the cm-10.1.3-605.zip file and accept the install. Repeat the same process and install the Jellybean version of GApps which should be named 'gapps-jb-20130812-signed.zip'. Refer to images below:

Step 7: Reboot the device and you should successfully get into CyanogenMod 10!!!!
Step 8: Sign into your Google account during setup then open Play and download and install a set of launchers of your choice (good and popular ones are Google's Now Launcher, NOVA Launcher, Apex, and Buzz
Step 9: Relish the success but we're not done yet because we want CM 11. In the settings menu of Cyanogen Mod 10 browse to 'About this Phone->CyanogenMod Updates'-> Select 'Update Types->All Versions Stable Only'. Select the cm-11-20140609-SNAPSHOT release and download/install.
Step 10: After CM 11 installs the phone will reboot, stop it before it starts up by pulling the battery. We will then enter the CWM bootloader one last time by pressing volume up, home, power. Install zip as before from external SD card and browse to the KitKat version of Gapps: gapps-kk-20140105-signed.zip. Install it. Reboot
Step 11: Now you can truly relish the victory of getting CM 11 installed on the Note 2!!!

Wrapup

Again, if you run into issues during these steps please refer to the linked sites which helped me to get through this process. So far I'm enjoying CM11 but I'll have to dig deeper into it to truly enjoy the benefits and customizations. I can already say that I've noticed the Note 2 is much more responsive and snappy than before which is a nice bonus.

Simple Web Crawler in Python - Parse Domain Links Using urllib2 and HTMLParser

I know there are quite a few "Simple Python Crawlers" out on the web for easy download and use. Nonetheless, I felt like I'd add yet another to the mix - Hey, innovation doesn't work without choice, right? Writing a basic web-crawler is pretty simple if you leverage Python's built-in modules that handle the most difficult aspects: opening and managing socket connections to remote servers and parsing the returned HTML. The Python modules urllib2 and HTMLParser provide you with the high-level interface to these lower level processes. The crawler I've written for the tutorial leverages these modules, runs from the command-line, and takes the following two arguments:

"seed url" - where the crawler will begin its parsing
"all" or "local"

The "local" flag tells the crawler to parse only the http links that are contained within the "seed url" domain (local). This means that eventually the parser will stop because there are a limited number of links within a domain. Note that if you were to crawl a large domain, like www.microsoft.com, it could take a very long time to complete the crawl. Caveat #1: (IMPORTANT!)This crawler only looks at the base url on http links to stay within the domain. If absolute links are within the page (e.g., a href="/") this crawler won't pick those up You'll have to add that functionality if that's what you're looking for (but that should be fairly easy)
The "all" flag tells the crawler to parse every http link it finds within the html, even if they are outside the domain. Note that this means the spider could take a very, very, very long time to complete its crawl (years?) I'd suggest running this only if you'd like to see how quickly the number of pending links virtually explodes as the spider crawls. You'll not want to run it for long though as your machine will likely deplete its memory.

Before we begin, you can get the entire source code here but I'd recommend taking a look at the step-by-step below so you can understand how to customize it to your needs.

Caveat #2: Although I've run the program against a handfull of sites and haven't had problems, I've not tested this very thoroughly. This means there could be errors, problems, situations where it crashes, or it could even be giving incorrect link counts. In the coming days I intend on testing it more but if you run into problems let me know in the comments.

Run the Program from the Command-Line

Nothing too complex here: If you'd like to run the crawler to parse only the local domain links on this website you'd give the following command from the command-line:

python spider.py http://berrytutorials.blogspot.com local

Otherwise, if you want to crawl the web starting with my site as the seed url then you'd run the following command:

python spider.py http://berrytutorials.blogspot.com all

The program will give you updates on status, printing the number of pending URLs in the queue along with the number of links(URLs) that have been processed, and when it completes, the total number of links it found. Along the way, as HTMLParser processes the HTML, you'll likely encounter errors in parsing due to malformed tags, etc. that are things that HTMLParser cannot gracefully overlook. The following is what the tail-end of the output looks like:

.....
.....
.....

Crawl Exception: Malformed tag found when parsing HTML
bad end tag: "", at line 1266, column 16

15 Pending URLs are in the queue.
369 URLs have been fully processed.

10 Pending URLs are in the queue.
374 URLs have been fully processed.

5 Pending URLs are in the queue.
379 URLs have been fully processed.

Total number of links: 382
Main-Mini:Desktop john$

I understand that there are better HTML parsers out there in Python such as BeautifulSoup that might be able to handle poorly-formed HTML, however I'm a bit more familiar with HTMLParser.

Overall Architecture of the Simple Crawler

The base design of the crawler consists of the following:

Spider class: Main class that defines two dictionaries to hold the pending URLs to be processed and the visited URLs that are complete. The visited URLs dictionary maps the URL to the HTML that was parsed by HTMLParser so you can further process the link content as suits your application. Also, Spider defines a function called "startcrawling()" which is called to begin the crawl.
LinksHTMLParser: HTML parsing class, declared as a local variable within the startcrawling function in Spider. This class extends the base HTMLParser by overriding the handle_starttag function to only parse out anchor tags. It also defines a local variable named "links" that holds the processed links as strings so the Spider can access them and perform further processing.

Spider Class Details

The main algorithm is in the Spider class' startcrawling() function and operates as follows (in semi-pseudo-code):

While there are URLs in the pendingURLs dictionary:    
     pop another URL from the pending URLs dictionary to process.    
     make HEAD request to URL and check content-type    
     if content-type is 'text/html' process, 
     otherwise continue (break from this 
     iteration of loop)       
          open URL and read in HTML       
          add URL to list of visited URLs       
          for each of the HTTP links found when processing the HTML:          
               parse the link to make sure it is syntactically correct.          
               check to make sure it's HTTP and it hasn't already been visited          
               if command-line option is 'local' check the domain of the link.             
               if the domain is not the same then disregard, 
                 otherwise add to pendingURls          
               otherwise, if adding all links, just add to pendingURLs

Refer to the following code detailing the Spider class:

import sys
import re
import urllib2
from urllib2 import URLError

# Snow Leopard Fix for threading issues Trace/BPT trap problem
urllib2.install_opener(urllib2.build_opener())
from urlparse import urlparse
import threading
import time
from HTMLParser import HTMLParser


"""
Spider takes a starting URL and  visits all links found within each page
until it doesn't find anymore 
"""
class Spider():
 
 def __init__(self,sUrl, crawl):
 
  #Urlparse has the following attributes: scheme, netloc, path, params,query,fragment
  self.startUrl = urlparse(sUrl)
  self.visitedUrls = {} # Map of link -> page HTML
  self.pendingUrls = {sUrl:sUrl} # Map of link->link. Redundant, but used for speed of lookups in hash
  self.startUrlString = sUrl
  self.crawlType = crawl
  self.numBrokenLinks = 0
  self.numTotalLinks = 0
  
 """ Main crawling function that parses the URLs, stores the HTML from each in visitedUrls
   and analyzes the HTML to acquire and process the links within the HTML"""
 def startcrawling(self):
   
  while len(self.pendingUrls) > 0:
   try:
    
    self.printProcessed()
   
    currUrl = self.pendingUrls.popitem()[0]  
    
    
    # Make HEAD request first to see if the type is text/html
    url = urllib2.urlopen(HeadRequest(currUrl))
    conType = url.info()['content-type']
    conTypeVal = conType.split(';')
    
    # Only look at pages that have a content-type of 'text/html'
    if conTypeVal[0] == 'text/html':
 
     url = urllib2.urlopen(currUrl)
     html = url.read()
     
     # Map HTML of the current URL in process in the dictionary to the link
     # for further processing if required
     self.visitedUrls[currUrl] = html
     
     # LinksHTMLParser is extended to take out the a tags only and store 
     htmlparser = LinksHTMLParser()
     htmlparser.feed(html)
     
     # Check each of the a tags found by Parser and store if scheme is http
     # and if it already doesn't exist in the visitedUrls dictionary
     for link in htmlparser.links.keys(): 
      url = urlparse(link)
      
      if url.scheme == 'http' and not self.visitedUrls.has_key(link): 
       if self.crawlType == 'local': 
        if url.netloc == self.startUrl.netloc:
         if not self.pendingUrls.has_key(link):
          self.pendingUrls[link] = link
            
       else: 
        if not self.pendingUrls.has_key(link):    
         self.pendingUrls[link] = link
           

   
   # Don't die on exceptions.  Print and move on
   except URLError:
    print "Crawl Exception: URL parsing error" 
    
   except Exception,details:
    print "Crawl Exception: Malformed tag found when parsing HTML"
    print details
    # Even if there was a problem parsing HTML add the link to the list
    self.visitedUrls[currUrl] = 'None'
    
  if self.crawlType == 'local':
   self.numTotalLinks = len(self.visitedUrls)
 
  print "Total number of links: %d" % self.numTotalLinks

You can see the main loop processes links while there are still pendingUrls in the queue (while 1 and len(self.pendingUrls) > 0). It opens the current Url to process from the pendingURLs dictionary by removing it from the queue using the popitem() function.

Note that because I'm using dictionaries there is no order to the processing of the links; a random one is popped from the dictionary. An improvement/enhancement/customization might be to use an actual queue(list) and process the links in order they were added to the queue. In my case, I decided to randomly process because I didn't think the order mattered in the long run. In the case of visitedURLs I used the dictionary mainly because I wanted quick lookup (O(1)) of the hash for processing the HTML down the road.

Next, a HEAD request is made to the current URL to process to check its 'content-type' value in the header. If it's a 'text/html' content type, we will process it further. I went this route because a) I didn't want to process document (.pdf, .doc, .txt, etc.) files, image (.jpg, .png, etc), audio/video, etc. I only want to look at html files. Also, the reason I make the HEAD request before downloading the entire page is mainly so the crawler is more "polite"; i.e., so it doesn't eat up server processing time downloading entire pages unless it's totally necessary.

After validating the HEAD request, the program downloads the entire page and feeds it to LinksHTMLParser. The following is the code for LinksHTMLParser:

class LinksHTMLParser(HTMLParser):

 def __init__(self):
  self.links = {}
  self.regex = re.compile('^href$')
  HTMLParser.__init__(self)
  
 
    
 # Pull the a href link values out and add to links list
 def handle_starttag(self,tag,attrs):
  if tag == 'a':
   try:
    # Run through the attributes and values appending 
    # tags to the dictionary (only non duplicate links
    # will be appended)
    for (attribute,value) in attrs:
     match = self.regex.match(attribute)
     if match is not None and not self.links.has_key(value):
      self.links[value] = value
      
     
   except Exception,details:
    print "LinksHTMLParser: " 
    print Exception,details

You can see that I've inherited from HTMLParser and overridden the handle_starttag function so we only look at anchor tags that have an href value (in order to eliminate some tag processing). Then LinksHTMLParser adds each anchor link to an internal dictionary called links that holds the links on that processed page for Spider to further process.

Finally, Spider loops on the links found by LinksHTMLParser and checks if it's local (domain-only) crawl will check the domain of each link to make sure it's the same as the "seed URL". Otherwise it just adds it if the link doesn't already exist in the pendingURLs dictionary.

Areas for Crawler Customization and Enhancement

As it is written, the crawler doesn't do much more than get the links, parse them, count them, store each link's HTML in a dictionary, and return a total. Obviously you'd be advised to make it actually do something real, even as simple as just printing out the links it finds so you can review them. In fact, before posting this I had it doing just that (to standard out) after the main while loop returned (lines 90-91) in the whole version:

for link in self.visitedUrls.keys():
 print link

You might customize that to write to a file instead of STDOUT so it could be further processed by external scripts.

Here's some other enhancements I'd suggest:

Use the crawler to parse the HTML content you've stored for each link in the pendingUrls dictionary. Say you're looking for some particular content on a site, you'd add a function that processes that HTML after the startcrawling function is complete, using another extended version of the LinksHTMLParser to do some other scraping.
limit the "depth" that the crawler runs when it's an "all" search - e.g., have a variable from the command line limit the number of times the crawler runs through the found links so you can get the all version to stop.
Although this crawler is semi-polite because it requests the HEAD before the whole page, you'd really want to download the robots.txt file from the seed domain (and from each outside domain that the crawler accesses if you're hitting all domains) to ensure crawlers are allowed. You don't want to accidentally access some NSA website, scrape all the content, then have agents knocking at your door that afternoon.
This crawler makes requests on the webserver without any delay between requests and worst-case could bring a server down or severely slow it down. You'd likely put some kind of delay between requests so as to not overwhelm the target servers (use time.sleep(SECS))
Instead of making the HEAD request, checking the content-type to see if the URL pending to visit is html, you could use a regular epression to test the URL to see if it ends in either a '/', '.asp','.php','htm', or 'html' then just request the page. This would avoid the immediate GET after the HEAD and limit stress on the server

Preventing Duplicate HTML Content

One issue I thought of is it would be a good idea to enhance the crawler so it doesn't look at duplicate HTML content if your end goal is to test the actual page details for each link. The crawler is written so it definitely doesn't store duplicate links but that doesn't guarantee that the HTML content is unique. For example, on my site it's finding 382 total unique links even though I only have 15 posts. Where are all these extra links coming from?

It's the widgets that i'm using in my template, for example here are some 'widget' links the crawler found:

http://berrytutorials.blogspot.com/search/label/blackberry?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=YEARLY-1230796800000&toggleopen=MONTHLY-1262332800000

http://berrytutorials.blogspot.com/2009/11/create-custom-listfield-change.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=YEARLY-1262332800000&toggleopen=MONTHLY-1257058800000

Although these are unique links, they point to content that is already catalogued by the crawler when it found the main page (ex., look at the second link, it points to the article 'create-custom-listfield-change.html', and the crawler also holds the link to the actual page - the HTML content is duplicated).

To prevent this, I'd think after the crawler is complete you'd have a 'normalization' process where the found links are checked for duplicate content. Since I've stored the HTML for each link you wouldn't have to have the spider reconnect to the crawled website, just check the HTML. I haven't thought this through completely to suggest an algorithm that would be fast though so I'll leave that up to you.

Wrap up and Roll

Although there are tons of open-source crawlers on the web I think that writing one yourself will definitely help you understand the complexities of link and content parsing and will help you actually visualize the explosion of links that are out there. For example, I set this crawler on www.yahoo.com and within a couple minutes it was up to over 2000 links that were in the queue. I was honestly surprised to find so many links just in my simple blog. It was a great learning experience and I hope this article helped you along the path to writing a more advanced crawler.

In case you're interested, here is an article about distributed crawlers (state of the art from 2003 :) here

As usual, let me know if you have questions/concerns in the comments!

Create Thumbnails from Image URLs for Django using ImageField, Urllib, and Python Imaging Library(PIL)

While working on a recent Django project I had the need to create thumbnails from images residing on a remote server to store within one of my Django app models. The wrinkle was that I wanted to programmatically make the thumbnail "on the fly" before storage since I didn't want to waste space storing the original, much larger, image. So if you are in a similar situation, where do you start?

Considering the vast array of libraries for Python, I hunted down the most referenced one, Python Imaging Library(PIL), and installed it. For purposes of this tutorial, I'll presuppose that you've already installed PIL on your platform and have some experience manipulating images using it. I'm working on OS X (Snow Leopard) and had no issues getting PIL working but if you do, follow the directions on this blog post. (I can't help if you're on Windows) If you can run the command 'from PIL import Image' from the python prompt then you've installed it properly, Django shouldn't complain, and the code below should work.

The Thumbnail Model

Once you've installed PIL your hardest struggles are over. For our simplified example we'll create a custom image model with 'url' and 'thumb' attributes. The 'url' attribute will store the url of the image in the event you need to reference the original picture and 'thumb' will be an ImageField that stores the location of the thumbnail we create. We'll define a function called 'create_thumb' that will perform the image manipulation. Here's what the Model looks like, including the required imports:

import Image
import os
import urllib
from django.core.files import File
.....
....
..

class Thumbnail(models.Model):
 url  =models.CharField(max_length=255, unique=True)
         
 # Set the upload_to parameter to the directory where you'll store the 
 # thumbs
 thumb = models.ImageField(upload_to='thumbs', null=true)
 
 """ Pulls image, converts it to thumbnail, then 
   saves in thumbs directory of Django install """
 def create_thumb(self):
  
  if self.url and not self.thumb:
   
   image = urllib.urlretrieve(self.url)
   
   # Create the thumbnail of dimension size
   size=128,128
   t_img = Image.open(image[0])
   t_img.thumbnail(size) 
 
   # Get the directory name where the temp image was stored
   # by urlretrieve
   dir_name = os.path.dirname(image[0])

   # Get the image name from the url
   img_name = os.path.basename(self.url)

   # Save the thumbnail in the same temp directory 
   # where urlretrieve got the full-sized image, 
   # using the same file extention in os.path.basename()
   t_img.save(os.path.join(dir_name, "thumb" + img_name))
 
   # Save the thumbnail in the media directory, prepend thumb  
   self.thumb.save(os.path.basename("thumb" + self.url),File(open(os.path.join(dir_name, "thumb" + img_name)))

What is that Code Doing? Can it be Improved?

Although the code is commented pretty well, I'll give a bit more explanation. In Django, the ImageField doesn't actually store the image in your database. Instead, it is stored in a directory located on the path your 'MEDIA_ROOT' is configured to in the settings.py file. So make sure that this appropriately configured, then create the 'thumbs' subdirectory within that directory. That's what the 'upload_to' parameter is used for in the ImageField type.

The 'create_thumb' method should be called when you create an instance of the Thumbnail model. Here's an example of one way you could use it:

a = Thumbnail(url='http://url.to.the.image.you.want.a.thumbnail.of')
a.create_thumb()
a.save()

The 'create_thumb' method takes the url, creates the thumbnail, and saves it in the 'upload_to' directory. Since this is sample code I didn't put any provision for catching exceptions such as improper url provided or image processing errors - that would be one area I'd suggest you improve upon. Also, the thumbnails are saved under the same name of the image provided by the url with "thumb" prepended. You might wonder what happens if two urls have the same image name? Well, I'll tell you: The new thumb will overwrite the old so you will want to add code that creates a unique name for the image.

Besides the few caveats mentioned above, the code works as advertised and will make your life that much easier...at least when it comes to creating thumbnails. As usual, I only ask that if this post helped you, please leave a comment.

Django Configuration - Serve Static Media w/Templates

I must say that Django documentation is all-around really fantastic. In fact, I've never run into a situation where I was totally stumped and their docs haven't saved the day. Regardless, there are always niches where you'll wish there was a slightly better real-world example - in this case regarding serving static media (stylesheets, javascript, image files, etc) on the development server. Their doc on this subject, found here, covers the base configuration pretty well however I still found that I couldn't get Django to recognize my static media for some reason. After messing with the config for a bit I managed to get it operational so If you're having the same problem, follow these steps and you'll have it running in no time.

The tutorial assumes basic knowledge of Django and that your project is located at the path (on OS X): '/Users/[USER_NAME]>/Code/Django/[PROJECT_NAME]' where USER_NAME is your OS X account and PROJECT_NAME is the top level directory of your project, likely where you ran the 'django-admin.py startproject [PROJECT_NAME]' command. For example, the path to my project is '/Users/john/Code/Django/testproj'. Obviously your code doesn't have to be on this exact path but you'll need to make sure you adjust the paths in the code below accordingly.

Note : In section two below I give two different ways to configure your settings.py file; The first way is with the paths hardcoded into the variables and the second is using python's built-in os module to create absolute paths to your static files. I kept both in here as a demonstration of how it works but I highly recommend going the absolute path route since your code will be portable across systems. Also, it should work on Windows without mods as well.

Configure URLconf - Make Static Media View Available for DEBUG Only

First, open the top-level urls.py file and add the following settings.DEBUG code after the urlpatterns that already reside there. The following is a basic example of my urls.py file:

from django.conf.urls.defaults import *
from testproj.base.views import index
from django.conf import settings

urlpatterns = patterns('',
     (r'^$', index),
)
if settings.DEBUG:
     urlpatterns += patterns('',(r'^static/(?P.*)$', 'django.views.static.serve',{'document_root': settings.MEDIA_ROOT}),
)

As explained in Django's documentation, it is recommended that in a production environment the server should provide the static files, not your Django code. Using the settings.DEBUG test will ensure that when you move it to production you'll catch the static files being served by Django since the DEBUG setting will be False in prod.

In the code you're importing the settings module from django.conf. The settings.DEBUG config is ensuring that any requests matching 'static/PATH' are served by the django.views.static.serve view with a context containing 'document_root' set to whatever path is found in settings.MEDIA_ROOT. Next we're going to set the value of that path in the settings.py file.

Configure Django Settings to the Location of Your Media

The first step is to create a directory named 'static' in the Django project folder and two directories within it named 'css' and 'scripts'. In the future you'll place your static files in folders within the main 'static' directory - the way you organize them doesn't make a difference just make sure it's a logical setup and that you adjust your template tags to point to the right folder (refer to the next section).

Option #1: Hardcoded Path

Open your settings.py file and modify the MEDIA_ROOT, MEDIA_UR, and ADMIN_MEDIA_PREFIX settings as follows (remembering to change the path to the location of your own static folder):

MEDIA_ROOT = '/Users/john/Code/Django/testproj/static/'

MEDIA_URL = '/static/'

ADMIN_MEDIA_PREFIX = '/media/'

Now, double-check that you didn't forget to add the beginning and ending '/' on each of the paths you modified as this will confuse Django. Note that while the ADMIN_MEDIA_PREFIX and MEDIA_URL don't necessarily have to be different, it is recommended by Django. If somehow you've wandered in here looking for instructions on how to do this on Windows I believe that the only difference in the entire tutorial is to change MEDIA_ROOT setting to 'C:/path/to/your/static/folder'. Following the rest of this should work on Windows but I didn't have time to validate that.

Option #2: Absolute Path option - Recommended

Instead of hardcoding as mentioned above, I'd recommend going the absolute path route since you can port your code from system to system without rewriting the settings.py file. Remember, either option should work but only use one way or the other.

Instead of hardcoding the paths into settings.py you'll import the os module and use the os.path.abspath('') method to acquire the working directory for your code dynamically and set it to ROOT_PATH. Then you'll adjust the same settings as before using the os.path.join method to attach your static directory. Refer to the code in settings.py below:

import os

ROOT_PATH = os.path.abspath('')
....
....
MEDIA_ROOT = os.path.join(ROOT_PATH,'static')
MEDIA_URL = os.path.join(ROOT_PATH,'static')
ADMIN_MEDIA_PREFIX = os.path.join(ROOT_PATH,'media')

Notice that when you use os.path.join(ROOT_PATH,'static') you won't put the '/' before and after the static directory as we did in the previous option.

Modify your Template to Point at the Static Directory

The final step is to make sure that in you've modified the HTML tags (script, link, etc) in your template to point at your 'static' directory. For example, here's how my script and link tags are configured for the example project:

The script src link is set to "/static/scripts/scripts.js" since this is where I've placed my javascript files. Again, don't forget to put the initial '/' at the beginning of your path.

Wrap it Up and Call it a Day

Test it out by running the 'python manage.py runserver' command from your project directory and everything should be working beautifully. If you run into problems it's likely that you've either forgotten or added a '/' in the wrong place on the paths you've configured.