Posts filed under ‘Enterprise software’

2008 Web Search is still in 1979

On Thursday (04/24/2008 ) last week, I had the privilege of talking to Dr. Jim Martin’s Natural Language Processing (NLP) graduate class, at the University of Colorado at Boulder, about the work that we are  doing at Filtrbox and the role that current NLP students will play in the future of information technology.  This blog post is the basis of my message to the class.

As I have written before, the problem that we face today is how to harness the data that is available on the web so that we can apply meaningful interpretation to it using applications.  This problem is rooted in the assumption that the data that is stored on the web is “unstructured”.  Unlike the majority of the data processed by applications today which is stored in some form of a structure e.g. a relational database, the data on the web is not so, as its is perceived as discrete pieces of data scattered all over the web.

I told the class that part of what I am doing at Filtrbox is an attempt to prove that the data on the web is not as “unstructured” as we may think today.  Within that data, there is a lot of structure, relationship and general interconnectedness no matter how “discrete” we may think it is.  With effective mining of the data and good applications, we can apply interpretation to the data and produce meaningful information.  However, we are still far from applications that can apply effective interpretive meaning on this data.  The reason for this is that we have to address the problem of information retrieval (IR) first before we can get to the writing of applications. 

To recognize where we are today on the continuum of web data information retreival and applications; a look at the evolution of enterprise applications gives us a great analogy:

Enterprise applications are where they are today primarily because they have a structured data storage model (Relational Database or RDB) and a standard access model (Structured Query Language or SQL).  Before there were enterprise applications that we know today, there were only RDBs and SQL.  While RDB work dates back to the 1960s, the RDBs that the majority is familiar with today had their beginnings in the 1970s.  The first (or widely believed to be) commercially available implementation of RDB+SQL was Oracle, then known as Relational Software, in 1979. This provided the ability to query an RDB for data using SQL but no applications as we know them today.  Analogizing this with the web, this is where we are today. We can go on Google or our favorite RSS readers (RDB analogy) and query for web data using a weak REST API or search form (SQL analogy) but we have no applications comparative to what is in enterprise today to interpret that data.  So simply put, today we are where enterprise applications were in 1979.

My message to the class was that applications like Filtrbox are starting to barely scratch the surface with respect to the implementing of applications on top of web data.  That is because, although its 2008, we are still in 1979.  The stumbling block is the perception of the “unstructured” nature of web data. Today’s NLP students will play a large role tomorrow in identifying and establishing structure in the “unstructured” web data in order to move us beyond 1979.

Advertisements

April 28, 2008 at 12:51 am 1 comment

Correct RSS date format

If you see a date like “01/02/07” in an RSS feed, what do you do?  You write a blog post about it. 

The applications that I am working on are reliant on some calculations using RSS dates.  I have noticed that the RSS date specification is probably the most taken for granted part of the RSS spec.  It is taken for granted because many consumers of RSS program around the date inconsistencies so there is not much of an outcry.  However, when you see a date like 01/02/07, you have to stop and say something. 

To those developers generating RSS feeds, please take a look at the RSS date format specifications as per the RSS specification.  I will summarize it here: 

The RSS date must conform to the RFC-822 (refer to the BNF for “date-time”  in section 5) date time format.  Examples of this format are: 

Wed, 04 Feb 2008 08:00:00 EST

Wed, 04 Feb 2008 13:00:00 GMT

Wed, 04 Feb 2008 15:00:00 +0200 

Do not just execute a stringifying method on your date object before writing it to the RSS feed.  Set the date format to the above mentioned format first before writing it to the RSS feed. 

To validate whether your date is correct, you can use http://feedvalidator.org

February 4, 2008 at 7:19 pm 2 comments

That software may be around for a very long time….write it well.

During the holidays I was surfing the web and discovered forums dedicated to software that I wrote almost a decade ago. It felt really good discovering that there are hordes of consultants out there being certified on architecture, designs and API that I conceived and developed (There is nothing like discovering that people’s passing of a certification hinges upon them knowing the meaning of a phrase or term that you coined).  

Feeling proud of myself and maybe even a little boastful, I decided to anonymously answer a question in one of the free forums since I would “obviously” be the final authority on such matters.  As soon as I posted the “obviously correct” answer to the question, there was a response from one veteran consultant who indicated that I did not know what I was talking about, I had it all wrong and he proceeded to teach me the correct usage of the part of the software under discussion. WHOA!!! Wait a minute!!! But, I created the software!!! You can’t tell me the “correct usage” of my own API. It turns out that after so many years of consulting on the software, many consultants have come up with very creative workarounds and ingenious uses of the software.  I tip my hat to them because they are now doing things with the software that I did not even imagine at the time that I designed and developed the software.  I was both proud and humbled after reading the response from the consultant.  

This experience reminded me of the importance of architecting, designing and developing enduring software because you never know how long your code will be out there making a difference in people’s lives.

January 13, 2008 at 6:49 am Leave a comment


Calendar

October 2017
M T W T F S S
« Sep    
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Posts by Month

Posts by Category