Wednesday, July 11, 2007

Countdown to Chicago

Next week I'm going to Chicago to present to the mentors at the RHLCCC, so I've been spending most of my time working on a slideshow. I'm pretty excited and can't wait to meet everyone there. We're also going to be conducting a focus group in order to see what people think of the beta-release and what new features they would find useful. I imagine it's going to be pretty hectic since I get in Sunday night, present Monday morning, focus group after lunch and fly out Monday evening. Anyway, I don't have much else to say, I just wanted to post a quick update.

Wednesday, June 13, 2007

Putting the Pieces Together

You can imagine how frustrated I was when the day after my last post, Google decided to release version 1.4 of the GWT, which may have fixed the difficulties with cross-domain scripting in the GWT. However, I am not even going to bother testing whether or not the changes they made are adequate until I've finished developing the g3p with the implementation (mac-1.3.3) I've got.

Anyway, after I added the saving of preferences/favorites, it was time to work on the actual search mechanism. In order to do this I ended up wrapping the Entrez utilities with a Java API. This went surprisingly smoothly, and even though it still had bugs in it the first try, the error reporting mechanism was correct enough to identify where they were. Here is a diagram overview of the way the search works:



I completely isolated the Entrez search system into a subpackage of the g3p project, and I am considering releasing it as a separate project for AJAX apps that need to access the Entrez databases. Basically it allows you to create pipelines using the Entrez utilities. Each utility can format its results however it wants to, since each receives XML of a different DTD. Certain utilities provide input for other utilities and so you can chain these together in different ways to achieve different searching goals. I am not yet sure how many distinct pipelines will be needed for the g3p, right now I am only using ESearch->EFetch for searching articles in the Pubmed database.

The XmlParser and XmlRequestor interfaces that the Entrez system talks to have to be implemented differently depending on how the search system is going to be embedded. For the g3p I wrote an IGXmlRequestor class that uses the gadget IG_FetchContent javascript method to retrieve its results. It would seem more natural to use the IG_FetchXmlContent, but this returns a document parsed by the browser DOM instead of text. There may be advantages to doing it this way, but for now parsing using the GWT XML packages seems to provide a much cleaner implementation.

When exception or errors occur during the search process, they are all bubbled back to the ResultsBox also, which can then display what went wrong. The overhead of wrapping the entire Entrez system in this way is worth the time it saves in debugging and adding features.

The other major change I made was to make the overflow scroll properly. Everything in the user interface is more complicated than it seems like it should be, because I want the g3p to work not just as a gadget, but as a standalone web application. This means heights and widths are almost never fixed, and so setting the proper max-height for the scroll bars to start showing up meant writing a WindowResizeListener to figure out the height of the window and set max-height accordingly.

In short, the project is coming along wonderfully, but there are still a number of issues that need to be dealt with. Some of these are minor things like connecting the display results checkbox or changing the number of results displayed. Other things however, will require feedback from the people who will be using it. These are things like what the results should look like, what types of searches are most useful, whether or not to have advanced search options and when to link to external webpages. Depending on the answers to these questions, the UI may still need to be completely redesigned. All in all though, I'm very happy, because its already time to start asking and answering these questions.

Monday, May 28, 2007

Cracking the Cross-Domain Security Problem

It's been a while since the last post, and since GSoC officially started today, it seems an appropriate time for an update. A lot has happened in the meantime. I graduated from college, interviewed in Austin and San Francisco, and reached something of a milestone on the g3p project. But that's enough about me, so let's get down to the details.

As I mentioned in a previous entry, attaching the AJAX application (compiled by the Google Web Toolkit) to the Google Gadget host module using an IFRAME was not my preferred method for doing things. The reason for this is because of a well-known security measure taken by most modern browsers that prevents scripts (i.e. Javascripts) in different windows (in the DOM sense of the word) but with different document.domain values from communicating with one another. There is a better explanation here.

Let me try to explain more clearly what the gadget is actually doing and why this is a problem. In the gadget's module definition, I create an IFRAME whose src attribute points to the GWT app's html file, which is located on my personal server. Since my domain is not the same as Google's, attaching the application in this way means I cannot pass data back and forth from the gadget module to the app contained in the IFRAME. However, in order to save preferences (or use any of the Gadget API's libraries), I need to be able to talk to the gadget module outside of the IFRAME.

Of course, Google is very aware of this problem, which is why they have different methods for including content in the gadget module (described here.) You might say that this seems like an application naturally fitted for the url content type. However, using type url would mean I have to have server-side code in order to get and set the gadget preferences, which I don't want. It also means I can't use their convenience methods for working with remote content.

So you looked at that link on working with remote content, and you are wondering why I can't just use those methods to retrieve the content from my server and latch it onto the gadget, while using a type html gadget. Well I'll tell you why, because this was actually my second choice strategy, and I spent some time implementing it this way before I eventually settled on my final solution. It is true that these methods would allow you to bypass the problem, since you could just create iframes and use document.write() to put the results returned by the content fetcher functions into these iframes. You would also need to do this for included scripts, however, which would just require adding an extra layer of "parent" to any existing code you have. No big deal, right? Well, sort of, but the GWT app itself uses iframes and their src attribute to include its components, and if you are going through all this trouble to use the GWT, why would you want to have to completely change the compiled output?

Alright, so it's not impossible to do it this way, but it's *REALLY* messy, so if this were my only option, I'd rather go for type url. But I don't want to do type url, and what I want to do should be possible and relatively painless. Afterall, the module definition is hosted on my server and Google proxies that content using an application running on their domain, so why shouldn't I be able to do the same for the rest of my content? Well, they do it for the module definition using a URL like this:

http://56.gmodules.com/ig/ifr?url=http://www.flatown.com/g3p/g3p.xml&nocache=2147483647&up_example=test&lang=en&country=us&.lang=en&.country=us&synd=ig&mid=56&parent=http://www.google.com

But that ifr app is only for parsing module XML files, so that doesn't really help. It does make you think, however, that they might be doing something similar in order to power those _IG_Fetch... functions. This was actually my first thought in the process, but I couldn't find the URL, and I figured Google would have let people know about it if it was something they wanted them doing.

So at this point, I was pretty stuck. No matter which direction I went, it looked like there was going to be a tradeoff between a serious rewrite of the GWT files (which someone else is supposed to be doing this summer anyway), and starting to write my own server-side stuff. I was about to bite the bullet and finish modifying the GWT to use _IG_Fetch, since I had just finished an experiment proving that this would be possible, when by serendipitous luck, a URL in the gadget message board caught my eye. It looked something like this:

http://gmodules.com/ig/proxy?

Sure enough, this was exactly what I was looking for: a proxy service on the gmodules domain to fetch any textual content. I reverted back to the old iframes, patched everything up to go through the proxy, and voila, I can now pass data to my app! Everything looks the same as before, but I've now included a patch script in the source code repository to be run every time the g3p is compiled. I also got rid of the history frame, but theoretically this could be made to work. If you want to know exactly what I need the script for, you can take a look at it here.

Anyway, I've really got to get going, even though I've left out some details and clarifications, they will have to wait for another time. Until then, have fun running circles around cross-domain security.

Monday, April 23, 2007

Keep On Keeping On

So it didn't take too long for me to realize I had the g3p project setup all wrong. When I posted last I was using Google Code to host the GWT compiled files (which are the Javascript/HTML/CSS source files) and not the actual Java source. This was stupid for more than one reason. First of all, GWT hashes the names of the important compiled files in order to prevent browsers from caching them. This would have been a nightmare to manage with subversion so I'm glad I was able to fix this early on. Second of all, it doesn't make any sense to have an open-source project that doesn't provide you with the actual source. So after many iterations of trying to get the directory structure right, the svn trunk finally contains the *right* elements.

This means in order to compile you will have to:
a) create a new project in GWT with the name com.flatown.client.g3p (this requires downloading GWT of course)
b) checkout the source code from Google Code into the src/ directory for your GWT distribution (you might have to delete the files GWT created there first)

Of course, this doesn't solve the problem of being able to use it as a Google Gadget. I was not happy with the idea of checking out the source code on my server and having to sync/recompile it every time I made changes. Nor did I like that I would have to install Java on the server in order to do this (I don't even know if I'm allowed to). My solution to this was to create a bash script that executes a simple rsync command to synchronize my local public/ directory with my server's, which isn't all that clever. Nonetheless, after all this, I finally have a comfortable development cycle.

I've also already done a decent amount of work on the gadget itself, which can now be found at:

http://www.flatown.com/g3p/g3p.xml

(If you use the Google Personalized Homepage just click Add Stuff in the upper righthand corner, then click Add by URL next to the search box button and paste the above URL into the textbox that pops up)

All in all I'm pretty happy with the progress I've made. Next time I'll go into the details of the actual Java/GWT development process, which even so far could have been a whole blog entry unto itself. Anyway, back to work...

Wednesday, April 18, 2007

GWT Gadgets, etc.

So I knew that I wanted to create the g3p gadget using the Google Web Toolkit (GWT), since we all know writing AJAX by hand can be a pain in the ass and this is almost exactly what GWT was created for. However, when I wrote my first Google gadget for the student gadget contest back in November, I did in fact first write it without the GWT, and then later started to rewrite it using the GWT. If you want to see what these look like, here is Version 1 and Version 2 (these are links to the XML files, you can try them out by adding them to your Personalized Google homepage).

Getting the GWT app to work as a gadget is not as easy as you might think, however, and I am still not sure what the best way to do it is. I also noticed that Google suggested making this easier as number 2 of their GSoC ideas. Somebody did in fact pick up this idea (here), and so I emailed him asking if he had any suggestions for the right way to do this in the meantime. I haven't heard back yet though, so I decided to take matters into my own hands.

The first problem I ran into was that I was trying to use the Google hosting SVN repo as my file server, which it turns out is a bad idea. So I checked out a copy of the project on my flatown.com server. The next problem was that I had originally embedded the GWT XML inside the Gadget XML, but this was causing javascript domain security issues when the gwt.js file injects its iframes and they try to refer back to the parent. What I ended up having to do was put an iframe inside the Gadget module with the GWT project's HTML file as the source, though I am really not very happy with this solution. Nonetheless, I settled on the fact that this was probably going to be the way I have to do it for the g3p gadget. The main reason I don't like this is because I am not looking forward to having to pass the gadget preferences to the GWT app, but perhaps this is the best/only way to do it. Anyway, that's the story and I am glad I can at least start working on the gadget in GWT for now.

Monday, April 16, 2007

Welcome

Welcome to my blog documenting the creation of the g3p (Google Gadget Gateway to PubMed) and my experience in the Google Summer of Code 2007. I am hosting the project here using Google Project Hosting. My mentoring organization is the Robert H. Lurie Comprehensive Cancer Center of Northwestern University.

I am really excited to begin the GSoC and I can't wait until I finish school so I can start working on the project full-time. I'm not really sure what form this blog is going to take on but I hope the rest of the posts won't be this boring. I'm going to try to update as often as possible, but I'm not expecting too much to happen until May. Stay tuned...