The Scenario…
A friend reached out to me with an interesting problem. Currently, the foreclosure lists maintained by King County only show the legal description of the lot, basic statistics, and the parcel tax ID. While this may be useful for some, it is difficult/impossible to determine where houses are located without clicking each individual listing. With thousands of listings, this is an undesirable process flow, and would take one hundreds of hours to sort through the data.
The Quest…
My challenge was to develop a method for resolving the individual property listings into functional data – preferably address and zipcode. This would enable my friend to be able to search by zip code, view the actual location of a property, and perform quick exporting into excel for additional manipulation.
The Process…
One of the initial considerations I had to surmount was determining the best method of collecting and processing information. I could have developed a program to pull the data down onto a separate server, manipulate it, and then display it to the user. While this would be the more traditional method, it would have required deep HTML parsing as King County does not have a API for forclosure lists. In addition, it would require either myself, or the friend, to maintain a server/hosting/application. I wanted this to be as stand alone of a program as possible, and require as little maintenance in the future as possible.
With server side processing out of the question, the next step was to look at client side possibilities. Immediately I turned to Chrome’s extension platform – the maturity and extensibility of this platform enabled me to perform the necessary DOM manipulations with little interference, keep the code contained to a single domain, and ensure that all of the data processing was performed on the client side. With this decision made, I dove into the netherworld of King County’s website to develop a solution…
The Magic…
Despite the spartan distribution of the foreclosure list, one of the most beautiful portions of the page are the Tax IDs. These IDs are essentially primary keys – they are unique values to every property within King County. I set out to discover a method on King County’s website to convert these IDs to addresses. After ruling out the official API, I dove into the deeper regions of the King County’s website. After a few hours of searching, I stumbled onto an ancient XML based API which had unrestricted access to converting Tax IDs into addresses (and much more!). Needless to say, I was ecstatic – the data was there, now it was time to put the puzzle together…
The Pieces Come Together…
Leveraging the API I discovered, I put together a quick JQuery library which would resolve the Tax IDs into addresses. One of the major issues I ran into was rate limiting the requests – there are thousands of listings on the foreclosure list, and each one required a connection. Even with forcibly keeping connections open, it was trivial to crash a Chrome process on a machine with less than 3 GB of memory. To get around this, I had a master process which controlled how many child processes were spawned at any given instance. In addition, I leveraged HTML5 storage to store the address values – by performing a custom hashing method on the quadratic field locations, I was able to determine if the page had changed with reasonable accuracy. If it had not changed, it would leverage the stored values rather than reaching out to the external server.
I also modified the action of the URLs themselves – rather than leading to the tax parcel details, they now would link directly to Google Maps so that my friend could view the location of the property on the map in a quick and easy fashion.
Needless to say, they were impressed beyond belief, and with the incredibly “hot” market in Seattle – every little advantage in the real estate market counts.
Other Notes…
The API had a 91% success rate of converting Tax IDs into Addresses – I am not sure why there was the 9% failure rate. My guess is that the API may have been running off of an older dataset which did not contain these IDs. Either way, a 91% success rate is considerably better than having to click every single listing.
Due to the sensitive nature of this project, I will not be releasing code samples or API endpoint locations. Happy Hunting! 🙂