I'm one of those people that most likely would have been diagnosed with ADD had it been as big an issue when I was a kid as it is now. I start projects, get to a certain point and then lose interest as another project pops up to take its place.It's always been about the nature of the project and how interesting it is. Some projects have been able to hold my interest longer than others simply by virtue of the complexity, the newness of the technology or the potential value of the project. One of those projects that has moved from the front burner to the back burner and back again and so forth . . . is back on my radar and I'm working on it. But there's really kind of an interesting little story about it and why it is back on my radar again.
A couple years ago, having lost all desire to work for the employer at
the time, I went to work for the local library as they had advertised
the need for someone who could do some web programming. Being an avid
reader and fan of libraries ( I've always loved the smell of the
library - I know, kind of weird and probably more than you wanted to
know about me ) and given that Topeka is far from bursting at the seams
with jobs for people in the computer industry ( good or otherwise ), I
gave it a shot.
Long story short, the job only lasted a year for a variety of reasons. However, I had initially wanted to do a lot of work with the library's catalog. It, both functionally and aesthetically, is very 1994. While they've since fixed the aesthetic part of their website, the catalog still remains something that could turned into something worthwhile one day. To be fair this is the state of most library catalogs and not limited to the one I briefly worked for. While there have been a couple attempts at building better library catalogs, it wouldn't appear that the practice has gone hugely mainstream. But, the purpose of this post wasn't to rag about library catalogs.
My interest in building a catalog that went above and beyond what they currently do has, for the most part, been one of those projects that moves back and forth from the front burner to the back burner, etc ... The biggest barrier to really working on it full throttle has been the lack of data - real data. As any developer knows, you need data to make your application work. Sure, you can always make up your own. But that is very, very time consuming and when you are trying to build an app that is industry-specific, your pieces and parts need to be industry-specific too.
Therefore I was really quite pleased when one day I was looking at one of these attempts at a better OPAC when I stumbled across a sample data file. The file, upon loading it, consisted of about 150K records of library holding materials like CDs, DVDs and books. However, it also contained author and subject data that could give me more to work with. The only problem was the formatting of the data which is probably an industry standard. To be sure, loading this data in a manner that would allow me to actually work with it was going to be a challenge.
While it was nice that all the data was conveniently stored in one table, I wanted something completely different. I wanted the holdings, authors and subjects stored in separate tables. I also wanted a many-to-many relationship and no duplicates. For example, if a subject like "Juvenile Fiction" came up for 100 books, I didn't want a new subject created in the table allocated for subjects. I wanted the loading of this data to be smart enough to find the subject already created and use it for the many-to-many mapping. The same could be said for authors. However, that's a pretty decent sized task considering how many authors there may be for any given holding and how many subjects for each holding.
I first tried processing the data all at once using Groovy on Grails. I mapped the data I had to a single domain class and created the others as well as defining the relationships I wanted. Since I had to do some data extraction from varchar fields there was some parsing and regular expressions involved in creating the author and subject objects. This didn't go over very well. I was never able to get it to complete without OutOfMemory errors popping up. Keep in mind, while I wanted the process to be smart enough to know of already existing authors I didn't try it the first time around. Instead it was just creating new authors each time and establishing the many-to-many relationship.
However, I was able to get it to finish with Django. This process was the one without searching for existing authors and subjects. Technically, that's not what I really wanted, but it passed initially. I had to run the process for authors separately that the one for subjects, but it did finish.
In case you're curious why I used Groovy on Grails or Django. It seemed to be the easiest way to do things. I have the ORM and a way to get the process running quickly.
Over the last weekend I was determined to get the process to run in which there were not duplicate subjects and authors created. While I came very close in Django, it just could not finish the task. Instead I ended up using Groovy on Grails to approach the task differently since it couldn't handle the all-at-once task either. As each item in the database is browsed all the authors and subjects are extracted and the relationships are saved appropriately.
Although I still have to do this each and every time the app is restarted ( since everything is dropped and created each time - I'll change that once I think I have the data model exactly as I want ) it makes it possible to browse the data which is part of the absolute minimum requirements this app must have.
So where does it stand now? Well it is still moving along nicely and staying on the front burner. The look and feel is nothing short of programmer-create-ui, but the beautification will come later. There's other pieces and parts that would come together better if I had more knowledge of library-type stuff. I would like nothing more than to be able to put it out there so that people could see it, use it and critique it. But, right now the requirements for hosting outweigh my desire to move forward in that direction.
Long story short, the job only lasted a year for a variety of reasons. However, I had initially wanted to do a lot of work with the library's catalog. It, both functionally and aesthetically, is very 1994. While they've since fixed the aesthetic part of their website, the catalog still remains something that could turned into something worthwhile one day. To be fair this is the state of most library catalogs and not limited to the one I briefly worked for. While there have been a couple attempts at building better library catalogs, it wouldn't appear that the practice has gone hugely mainstream. But, the purpose of this post wasn't to rag about library catalogs.
My interest in building a catalog that went above and beyond what they currently do has, for the most part, been one of those projects that moves back and forth from the front burner to the back burner, etc ... The biggest barrier to really working on it full throttle has been the lack of data - real data. As any developer knows, you need data to make your application work. Sure, you can always make up your own. But that is very, very time consuming and when you are trying to build an app that is industry-specific, your pieces and parts need to be industry-specific too.
Therefore I was really quite pleased when one day I was looking at one of these attempts at a better OPAC when I stumbled across a sample data file. The file, upon loading it, consisted of about 150K records of library holding materials like CDs, DVDs and books. However, it also contained author and subject data that could give me more to work with. The only problem was the formatting of the data which is probably an industry standard. To be sure, loading this data in a manner that would allow me to actually work with it was going to be a challenge.
While it was nice that all the data was conveniently stored in one table, I wanted something completely different. I wanted the holdings, authors and subjects stored in separate tables. I also wanted a many-to-many relationship and no duplicates. For example, if a subject like "Juvenile Fiction" came up for 100 books, I didn't want a new subject created in the table allocated for subjects. I wanted the loading of this data to be smart enough to find the subject already created and use it for the many-to-many mapping. The same could be said for authors. However, that's a pretty decent sized task considering how many authors there may be for any given holding and how many subjects for each holding.
I first tried processing the data all at once using Groovy on Grails. I mapped the data I had to a single domain class and created the others as well as defining the relationships I wanted. Since I had to do some data extraction from varchar fields there was some parsing and regular expressions involved in creating the author and subject objects. This didn't go over very well. I was never able to get it to complete without OutOfMemory errors popping up. Keep in mind, while I wanted the process to be smart enough to know of already existing authors I didn't try it the first time around. Instead it was just creating new authors each time and establishing the many-to-many relationship.
However, I was able to get it to finish with Django. This process was the one without searching for existing authors and subjects. Technically, that's not what I really wanted, but it passed initially. I had to run the process for authors separately that the one for subjects, but it did finish.
In case you're curious why I used Groovy on Grails or Django. It seemed to be the easiest way to do things. I have the ORM and a way to get the process running quickly.
Over the last weekend I was determined to get the process to run in which there were not duplicate subjects and authors created. While I came very close in Django, it just could not finish the task. Instead I ended up using Groovy on Grails to approach the task differently since it couldn't handle the all-at-once task either. As each item in the database is browsed all the authors and subjects are extracted and the relationships are saved appropriately.
Although I still have to do this each and every time the app is restarted ( since everything is dropped and created each time - I'll change that once I think I have the data model exactly as I want ) it makes it possible to browse the data which is part of the absolute minimum requirements this app must have.
So where does it stand now? Well it is still moving along nicely and staying on the front burner. The look and feel is nothing short of programmer-create-ui, but the beautification will come later. There's other pieces and parts that would come together better if I had more knowledge of library-type stuff. I would like nothing more than to be able to put it out there so that people could see it, use it and critique it. But, right now the requirements for hosting outweigh my desire to move forward in that direction.
