m.stefankendall.com – My iPhone web apps

May 22, 2010

Recently, I’ve become very interested in iPhone web app development and Grails, so I decided to merge the two into a little project. Using Grails, jQTouch, and jQuery, I built an application to search a variety of torrent portals and aggregate the results. I could describe the application, but why not just head over to m.stefankendall.com and check it out?

If you don’t have an iPhone or Android-like device, here are some screenshots to show you what the interface looks like on a mobile device:

iPhone Web App Screenshot 1 iPhone Web App Screenshot 2 iPhone Web App Screenshot 3

Working with this project, I discovered a few interesting bits of information.

  1. jQTouch is incomplete
    I had to work through the source code of the CSS files and demo page to make any headway into this library. There is very little documentation, and some of the pre-built “apple” styles don’t look truly native. This is sad, but it does do some things pretty well. For instance, if you bind $.tap event to a jQuery object, and the target browser doesn’t support the $.tap javascript event, the event is bound to $.click instead. This is great for testing. I also found that while Safari for Windows is a close approximation for the iPhone browser, it’s not the same.

    For instance, if your page has any HTML errors, such as improperly closed <input /> tag, Safari for Windows won’t complain. Safari for the iPhone, however, will break all to hell and respond haphazardly to javascript events. To prevent this from happening to you, I highly recommend enabling Safari developer mode. See the screenshots below.

    How to enable Safari Developer mode How to enable Safari Developer mode

  2. Groovy has built-in JSON conversion support
    At first, I thought I was going to need to use something like GSON to automatically convert my objects into JSON, but Groovy can do this with built-in libraries. Check this out:

    		def namedResults = [:]
    
    		results.each { ProviderService key, value ->
    			namedResults[key.getName()] = value
    		}
    
    		render namedResults as JSON
    

    This produces

    {"Demonoid":["Result1", "Result2",...],...}

    which is trivially parseable via the javascript JSON parser at JSON.org.

  3. GPath is a decent substitute for XPath
    I’m generally a big fan of XPath, but GPath ain’t so bad. Specifically, trying to find all the deepest nodes which match a certain criterion is pretty easy.

    		def slurper = new XmlSlurper()
    		slurper.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
    
    		def doc = slurper.parseText(xml)
    
    		//Find anchor tags for torrent links
    		def allAnchorTags = doc.depthFirst().findAll { GPathResult node ->
    			node["@class"] == "detLink"
    		}
    

    By combining depthFirst() and findAll {…}, one can filter all nodes in an XML document succinctly and with ease. The equivalent XPath //a[@class='detLink'] would require some extra lifting to parse the results so neatly.

  4. Auto-wiring is magical
    In the course of building the TorrentSearch application, I used nothing but auto-wiring to configure my services and controllers. This probably doesn’t scale all that well, but it worked well enough for my purposes. I tend to use the path of least resistance until it no longer works, and auto-injection via field name is definitely the path of least resistance. Code snippet:

    class TorrentSearchController
    {
    	def torrentProviderService
               ...
    }
    

    Simply by defining the service, it gets auto-injected with a singleton instance of the class named TorrentProviderService.

Conclusion

Grails is a really neat way to develop applications, and it makes putting together simple web services incredibly easy. In the future, I’ll see how well this scales in the future, but for now it seems like the way to build super quick web applications on the JVM. jQTouch, on the other hand, is rough around the edges, although the potential of native web applications for mobile devices is promising.

Easy HTML web scraping with Groovy and Java. (w/XOM)

May 5, 2010

Ever need to parse some HTML is Java or Groovy? No matter what the source, you’re almost always guaranteed to get bad, unformed garbage as a response when scraping. Rather than ditch XML readers and bust out regex, you can transform this data into good xhtml with tools like TagSoup.

The following class is a utility class meant to abstract away the nonsense of cleaning HTML. I’ve found that HTML data can come from a number of different sources (even files), so being able to clean strings containing html is immensely useful.

HtmlParsingUtils.cleanHtml(String html) cleans the HTML passed as an argument and returns parse-able XHTML. It does not validate entities to avoid the 503 w3.org return errors if you attempt to hit the w3 servers too frequently when validating XML entities – which is exactly what happens when using TagSoup when doing massive scraping.

XOM is my XML handling library of choice, although the code can be adapted if requirements dictate no-XOM.

Example usage:

String htmlData = obtainHtmlFromElsewhere();
String xml = HtmlParsingUtils.cleanHtml( htmlData );

Builder builder = HtmlParsingUtils.createNonValidatingXmlBuilder();
Document doc = builder.build(xml,HtmlParsingUtils.xhtmlBaseUri);
//Query: Find all anchor tags
Nodes nodes = doc.query("//html:a",HtmlParsingUtils.htmlCtx);

Presented below is the Groovy adaptation of HtmlParsingUtils. The Java version is identical, except the dummy entity resolver is built as private class declaration inline, rather than a closure.

import nu.xom.Builder
import nu.xom.XPathContext
import org.jdom.input.SAXBuilder
import org.xml.sax.EntityResolver
import org.xml.sax.InputSource
import org.xml.sax.XMLReader
import org.xml.sax.helpers.XMLReaderFactory

/**
 * Utility class for handling html
 */
class HtmlParsingUtils
{
	public static final EntityResolver dummyEntityResolver = { String publicId, String systemId ->
		return new InputSource(new StringReader(""))
	} as EntityResolver;

	public static final xhtmlBaseUri = "http://www.w3.org/1999/xhtml"

	//Required for querying
	public static final XPathContext htmlCtx = new XPathContext("html", xhtmlBaseUri)

	/**
	 * @param str possibly non-well-formed html
	 * @return xhtml representation of the argument data
	 */
	public static String cleanHtml(String str)
	{
		SAXBuilder builder = new org.jdom.input.SAXBuilder("org.ccil.cowan.tagsoup.Parser"); // build
		builder.setEntityResolver(dummyEntityResolver);
		Reader input = new StringReader(str);
		org.jdom.Document doc = builder.build(input);
		String cleanXmlDoc = new org.jdom.output.XMLOutputter().outputString(doc);

		return cleanXmlDoc;
	}

   /**
	 * @return XOM Builder that does not validate or resolve entities.
	 */
	public static Builder createNonValidatingXmlBuilder()
	{
		XMLReader reader = XMLReaderFactory.createXMLReader();
		reader.setEntityResolver(dummyEntityResolver);
		Builder builder = new Builder(reader);

		return builder;
	}
}

Grails: Absolutely pure productivity.

April 30, 2010

Your IDE can do more.

Recently, I decided I had a need to build a quick web service to back an AJAX application for iPhone that I’m developing. jQTouch is making the front-end work pretty simple, but I just wanted a super fast way to get a web service up and running that would be maintainable and easy to scale. I considered Spring with Tomcat/Hibernate, but I didn’t want to spend a week configuring XML. I’ve attempted to Ruby/Rails in the past, but I always missed being off the JVM. Then, I came upon Grails. It’s fast, it’s easy, and it’s Java (…ish), so in that sense there’s little to no learning curve aside from that of the Grails framework itself.

To begin my quest to utilize Grails, I attempted to setup Eclipse with the grails plugin. After a pound of headache, I just gave up. Then, I came upon Netbeans grails integration.

Check this out.

  1. Create a project.
    Step 1
  2. Let grails do its magic.

    Step 2

  3. Click the magic run arrow.
    Step 3
  4. Watch Grails start automagically under Jetty.
    Step 4
  5. Create a Domain class.
    Step 5
  6. Step 6

Now, any changes you make in the IDE to the controllers, services, or domain classes gets picked up immediately by the running webserver. In one click, you can deploy your entire application. After that one click, everything just works. This is how all IDEs should handle server deployment. Aside from this ridiculously easy setup, grails gives you dependency management off the bat with Ivy, which can be configured to use the maven2 repositories.

Check it out.

Dependency management

FURTHERMORE, grails deploys WARs for your application, so there’s no trickery to get your web application running. Develop under grails, then drop your WAR in your application container and be done with it.

Use Grails. It’s awesome.

Still Alive.

April 27, 2010

I’m currently working on a few side projects while trying to learn and use/deploy applications on Grails. I’ve also purchased another domain for my conquests. Hopefully I’ll have more to say about my side projects in the next few months, but I’ll hopefully get a Grails post up in early May. Expect pretty screenshots.

OAuth is a joke.

March 13, 2010

From oauth.net, OAuth is

An open protocol to allow secure API authorization in a simple and standard method from desktop and web applications.

Unfortunately, as it turns out, OAuth is neither simple nor standard in practice. In case you don’t care to read the tutorials, OAuth works something like this:

  1. You request a consumer key and consumer secret from a “service provider” (such as Twitter).
  2. You use this consumer key and secret to request a “request key” (insecure).
  3. You use the request key to generate a URL that the user can go to to authenticate.
  4. You either do, or don’t, pass a callback URL to the above request to receive the response token.
  5. Your callback URL gets hit, which contains the token needed to request an “access token” and “access token secret”, which may or may not expire or become invalid at any time.

These steps, however, are not so clear cut when you actually start using service providers which offer OAuth authentication.

Let’s look at the behavior of some big-name web services which have adopted OAuth.

Twitter logo

Twitter, I think, has taken a progressive approach to OAuth authentication. If you don’t pass a callback URL, you get a pretty screen prompting the user to deliver a PIN code to the application from which you’ve come.

Twitter pretty pin entry

This looks great, right? Well, if you try the same stuff on TripIt, you just get the return data.
TripIt logot

Response:

oauth_token=935394b819ab32c9558e7063dbb3bcda9c274a00

TripIt presents no user display, so you’re stuck if you didn’t want to a provide a callback URL. Once you understand that Twitter just does things differently, you begin to assume that using a callback URL would work swimmingly. Well, if you wanted Twitter support, you just lost it. Look at Twitter’s response to a request with a callback URL specified:

Response URL:

http://stefankendall.com:8080/test.jsp?oauth_token=0yLxRrUoxHJxA0tF8hJSxP2EHfFaYyw1a3nW4Dqg&oauth_verifier=Alzk3ehPRq6bbjvISlgXqn9Jk7yZJlmst0jtXpg

That’s right. TWO values, whereas TripIt just provided one. You’ll notice that oauth_token is now the token you requested with, and oauth_verifier is the real data you need. This is a direct contradiction in the naming and response schemes of TripIt and Twitter. Oy vey.

I would go on about Netflix, but that’s a jump down the rabbit hole I’m not willing to take right now.

OAuth is a joke, and as it stands, it’s neither simple nor a standard. Good grief.

100 ways to skin an interview question

February 24, 2010

In a recent interview, I was asked the following question:

Write a program to determine the 10th Fibonacci number.

Without further specification, this problem is worthless. Depending on the scope and nature of the program, the solution to this problem will vary wildly. For example, will this function get called in a tight inner loop? What are the memory constraints? What are the concerns with respect to reusability? I’ll make this more concrete below.

Naive solution

def F(n):
   if n == 1 or n == 0:
      return 1
   return F(n-1) + F(n-2)

Why this is completely wrong: What happens when you pass a fractional value to F(n)? You break. What happens if you pass a negative value? You break. You could fix this by checking if n <= 1 in the base case, but then you still have unneeded redundancy. Without explicitly defining the interface by which you generate a series, you've broken your code completely. Expect bugs, and expect huge memory waste. These results could be memoized, but consider the initial problem statement. Was there any implication that anything other than F(10) would need to be computed? No. This gives us a rather smart-ass solution.

Efficient Solution

Consider the following:

#define Fib10 89

If you can’t trust the compiler, memory constraints are tight, or performance is an issue, precomputation is clearly the solution. You KNOW F(10) without any programming whatsoever. You can compute it by hand and yield the above result. A more scrupulous reader might notice that F(10)==89 isn’t easily verifiable without writing code. A better solution might look like this.

Verifiable Efficient Solution

#define Fib10 (1+1+2+3+5+8+13+21+34+55)

It’s still sloppy, but it’s as efficient as the previous solution and easier to verify. This solution, however, has a fatal flaw: it only works for one value of N (although you could define more), and it’s not reusable at all. A compromise between efficiency and reusability might give you the following.

The Compromise Solution

Fib = [1,1]
for i in range(2,100):
   Fib.append( Fib[i-1]+Fib[i-2] )

This has no edge cases, it’s rather easy to verify, and you can adjust the maximum required value as necessary. This requires a fixed max size, though, which means it’s sure to break at some point. At best, you’re stuck with documenting this limitation. Once again, we consider memoization, but we can remember to avoid the unnecessary recursion to keep the solution a bit more efficient.

So, then, what do you do? Create something reusable.

The Reusable Solution

The code is getting funkier, so I’ll leave this as an exercise to the reader for now. Essentially, we can encapsulate the compromise solution in a class. If a client requests a Fibonacci number greater than that which we’ve precomputed, we compute up to the given value and store the intermediate computations. This solution, however, ignores the fact that most would ignore when approaching this problem.

The Math Solution

Namely, Fibonacci numbers can be generated in constant time with Binet’s formula.

For reasonable values of N, the rounding formula provides exact values.

The Real Solution

So what, then, is the right answer? It really depends on information left out of the problem statement. In the real world, you would probably just use a math library to get the values, knowing that someone has gone through the same thought process through which you’ve just gone. If this isn’t possible, then you have to consider one of the above methods.

One might argue that this reasoning ability is what an interviewer should be looking for, but that was not the emphasis of the question. Really, there are just better questions you can ask a person to figure out if they’re worth their salt, unless, of course, you’re really just pre-screening, at which point you’re just looking for the response “89.”

Copying a file in Java

February 17, 2010

Code snippet. Why isn’t this part of the Java API?

Just use commons.

<dependency>
   <groupId>commons-io</groupId>
   <artifactId>commons-io</groupId>
   <version>1.4</version>
</dependency>

FileUtils gives us this:

public static void copyFile(File srcFile, File destFile) throws IOException

Ruby and named regex matched groups.

February 17, 2010

Back from China, I’ve resume my normal programming schedule. I’m working through Pragmatic Bookshelf’s Programming Ruby, and there’s a lot more to ruby than I initially gave credit. For example, consider the following:

str = 'an' * 3000 + 'garbage' + 'an'*3000
r = /(?<pattern>.*) #The first part
    (?:.*?) #Garbage
    \k<pattern>/x #The first part, repeated.
str =~ r
puts $1.length

First, we create a repeated string, followed by some garbage data, followed by that string again. A pattern is built to find the largest prefix which is also a suffix, given a break in data. Ruby lets us name the regex

A break from (sexy) learning

January 21, 2010

For the past month, I’ve been unable to study anything new and sexy. Rather, I’ve been preparing for the ICPC 2010 World Finals.

Most recently, I’ve been working on this:

ICPC Challenge Screenshot

Essentially, contestants build an AI to battle it out in an FPS-esque arena. It was really quite interesting, although it required serious time input.

For more info, look here. My team really only had one good pre-submission, but our final product should be competitive.

If you’re interested in participating in AI battling tournaments, you might want to check out the newly created ACM Queue Challenge.

XSLT: How you should be transforming XML into HTML.

January 5, 2010

XSLT is a method of transforming XML with a set of rules, functions, and selectors. Transformations can be complex or simple, but the benefit of using XSLT is immediately obvious after first use.

Consider the following:

<posts>
   <post>
      <title>XSLT!</title>
      <description>A short post about XSLT</description>
      <content date = "01/03/2009">
      Lorum ipsum
      </content>
   </post>
<posts>

The naive approach of displaying this to the user is to manually parse the data with an XML parser of some kind. I’ve done this in previous projects; it works, but it’s generally stuck in code, which means that changes to the output format require recompilation or some other elaborate build and deployment. This is inherently wrong given the process that’s taking place. A transformation of XML can and should be independent from the XML itself.

Here’s some pseudocode for the rendering of an XML document into HTML:

html = '<html><body>'
for node in document.find( 'post' )
    title = node.find( 'title' ).text
    descriptionNode = node.find( 'description' )
    postDate = descriptionNode.attribute( 'date' )
    descriptionText = descriptionNode.text

    html += '<div>'
    html += '<h1>' + title + '</h1>'
    html += '<p>' + descriptionText + '</p>'
    html += '<p><em>' + postData + '</em></p>'
    html += '</div>'

html += '</body></html>'
render(html)

While this is straightforward enough in the trivial example, it falls apart when parsing large, complex documents. Consider now the XSLT equivalent:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:template match="/">
        <html>
            <body>
                <xsl:apply-templates />
            </body>
        </html>
    </xsl:template>
    <xsl:template match = "post">
        <div>
            <xsl:apply-templates />
        </div>
    </xsl:template>
    <xsl:template match = "title|description">
        <p><xsl:apply-templates /></p>
    </xsl:template>
    <xsl:template match = "content">
        <p><xsl:apply-templates /></p>
        <p><em><xsl:value-of select="@date" /></em></p>
    </xsl:template>
</xsl:stylesheet>

Although the trivial example is longer in terms of characters, there are numerous benefits to this approach.

  1. The transformation is externalized. If you decide that all post divs now need a “post” class, you don’t need to recompile, and you know exactly where to make the change.
  2. Presumably, your XSLT editor can validate XML, or at least tell you when you’re missing a closing angle bracket or two. Most IDEs will not see unclosed angle brackets within strings as a problem, so in you get a proper environment in which to craft your output HTML. Sure, you could use an XML library that hides this implementation from you, but you still run the risk of missing an element or forgetting to append nodes to one another. This wastes time. With a XSLT document, you can very clearly easily about your output before testing. This emphasizes another issue.
  3. Good XSLT editors will let you quickly and easily run XSLT against XML. Using oXygen, I was able to develop the above document in under a minute. Actually testing manually-manipulated XML, on the other hand, takes much more time.

All of this said, there are still reasons to avoid XSLT in certain situations. All browsers do not support XSLT, and there are few XSLT 2.0 processors at the moment. Ergo, you must perform your transformations on the server, rather than on the client, but you should probably be doing that anyway. Furthermore, maintenance programmers will need to know XSLT to debug and extend your application, so if you’re worried about limited knowledge handling your software, it may be best to simply avoid XSLT altogether.

I’ve quickly found that XSLT is often the right tool for the job, however, so it would be a sin to not learn enough of the technology to recognize when its applicable and when it isn’t. Don’t craft your HTML by hand, but only use XSLT when it’s appropriate.

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org