Filtrbox Pizzabox Bug-Bash

Last night, Filtrbox commandeered “The Bunker” at Techstars (thanks to David Cohen) for the first Filtrbox Pizzabox Bug-Bash.  We invited Boulder locals to come in and help us test Filtrbox as well as provide us feedback on the product thus far. The event was a success and I would like to thank all those who were in attendance.  The feedback that we received from testers was great.   Look out for more information about this event on the Filtrbox blog. We had an awesome evening of fun, pizza and beer; here are some pictures from last night:

 Filtrbox Pizzabox Bug-Bash (click to enlarge)  Filtrbox Pizzabox Bug-Bash (click to enlarge) Filtrbox Pizzabox Bug-Bash (click to enlarge) Filtrbox Pizzabox Bug-Bash (click to enlarge)


March 27, 2008 at 6:53 am 1 comment

TechStars notes in the raw #2

(I took copious notes during TechStars 2007. I am opening up my notebook and sharing them with aspiring entrepreneurs.   I am going to serialize my notes on this blog.  These are my RAW notes, so sometimes people spoke too fast or were inaudible but I tried to get the gist of what they were saying. There is very little editing to these notes.)

The following questions were addressed during one of the early TechStars panels:

1) What kills most startups?

  • Surprise!! Surprise!! Not making money is not usually the big issue that causes failures unless you don’t have a vision
  • Team dynamic issues – startup failures are mostly caused by founding team friction
  • Companies fail due to execution failures. Execution failures are still team issues that can be categorized as follows: 
  1.  Team dysfunction issues
  2.  Team poor performance issues

1 and 2. are a “chicken and egg” situation

  • Do not be afraid to address team issues head on, solve them and remove the problem
  • Once you have a team issue problem that threatens your startup, re-adjust what you are doing or join another team (None of the original TechStars team members changed teams)
  • Beware of meandering, where after several weeks you are not getting anywhere. Address and re-adjust immediately because you risk team members losing passion because you are not getting anywhere

 2)  “Getting acquired” as a business model

  • Getting acquired is not a business model.  It’s a WISH!!!
  • Concentrate on building a business that has compelling value

 3) The “style” of a startup

  • You have the permission to create your own identity
  • Have an attitude
  • Have a style
  • Develop a style for your startup and yourself and work it (America’s Next Top Startup, anyone???) all the way through


March 5, 2008 at 8:52 am Leave a comment

TechStars notes in the raw #1

(I took copious notes during TechStars 2007. I am opening up my notebook and sharing them with aspiring entrepreneurs.   I am going to serialize my notes on this blog.  These are my RAW notes, so sometimes people spoke too fast or were inaudible but I tried to get the gist of what they were saying. There is very little editing to these notes.)

The following questions were addressed during one of the early TechStars panels:

 1)     Do you need a brilliant idea before starting?

  • NO!!! 
  • You just need to get going.  If you ask too many people before you start and you get feedback, you are probably selling yourself short. Just start.
  • Look for analogies in paradigms.  The first internet revolution was trying to implement an analogy of the non-digital world.  Seek the next analogy.   Also consider addressing areas that failed in the first internet revolution. 

2)     How do you know if you have an idea and you should step it up?

  • When you start having people expressing need and people catching on.
  • Listen – Listen to your peer group. Listen to the right people and the people that form your market. VCs are not necessarily the market.  
  • Sometimes you have to provide what people are going to need tomorrow (the example that was given here was what Greg Reinacker did with the concept behind Newsgator).  When you ask people, they probably will tell you what they needed yesterday and not know what they need tomorrow.  So you have to be ahead of the ball game.
  • There are two ways to describe how startup ideas evolve                                                                
  1. Scratch someone else’s itch                                                         
  2. Scratch your own itch

Ultimately you have to move from 2. to 1.  

(My interpretation of what Brad Feld was saying here is that you either have to solve problems that you are having or problems that other people are having.  But to be a successful you have to end up solving problems that other people are having if you want your idea to get off the ground)

3)     Should you start a startup in the consumer space or the B2B space?

  • Relatively indifferent to consumer or B2B. The question is how you are extracting money , long term, from the people using your product
  • At this stage, getting a great service up and running is important but most importantly you need to think of how you will make money in the long term
  • Extracting money from the customer is an engineering problem. You are thinking about the “architecture of your business”.  Address how you interact with the customer for money. The internet is free but you have to work on something monetizable
  • Some people who think that they are addressing the consumer internet now, may end up with a business solution.  Keep your mind open.


March 4, 2008 at 8:24 am 4 comments

NLP: Unstructured thinking for unstructured data

In my last blog post, I talked about how we have had to develop Natural Language Processing (NLP) algorithms in order to overcome the lack of standardization on the web.  At Filtrbox, the more we dig deeper into the web, exploring its inner depths for information, the more I find that we are having to use a NLP concept here or a half NLP concept there to facilitate the process of mining unstructured data. The application of NLP concepts is increasingly figuring into the majority of our algorithms.  I have begun to notice that my thought process as software architect, designer and developer is tending to exhibit influences of NLP and machine learning concepts much more than before. 

I think NLP fundamentals are essential for those who wish to undertake the challenge of building the next generation of web applications that process the unstructured data on the web.  Yes, there are efforts to build a structured web via initiatives such as the semantic web and the various APIs being proposed. I respect these efforts; however, I would not solely rely on these initiatives alone.  The proposed APIs provide access to structured data stored on various islands on the web.  For those users who do not have their data on those islands, their data is not accessible via the API.  The Semantic Web is the initiative that will bring us closest to structured data on the web.  However, as we are witnessing its painfully slow adoption, it looks like its going to be a while before we have some structure on the web. The challenge is what do we do now while we wait for these initiatives to mature. I think what we do today is, instead of waiting for content publishers to structure their content, we process content publishers’ content as is and we programmatically infer the structure of the content.  The application of NLP concepts are one way we can make the content structure inferences.  By applying NLP, this will take us a step closer to programmatic input, processing and storage of unstructured data.  We have traditionally thought in terms of structured data, programmed for structured data and stored structured data.  The challenge posed by the web today is an opportunity to break new ground for software engineers and start thinking, programming and storing unstructured data.

February 29, 2008 at 8:56 am 2 comments

A case for standardizing blog templates

Alex Isikold of AdaptiveBlue has published a great post on “How YOU can make the web more structured”.  A section of this post, “Standardizing Blog Templates Across Platforms”, really resonates with me.  Isikold is suggesting that blogging platforms such as WordPress and TypePad standardize their templates.  Why is this important? 

To help answer this question, here is the Web 2.0 school of thought that I subscribe to:  Let’s start off with an enterprise database analogy. The basic assumption is that blogs are nothing but a data store.  While information in a blog makes for an interesting read, it is about as interesting as reading data in a text column in a relational database.  While the data in a single text column may have a lot of meaning, its meaning and usefulnes is enhanced when the data is combined with other columns in the same table in database, or with other tables in the same database, or even with data in other databases. The wealth of data is hidden in its interconnections with other data. In order to harvest the wealth of data in databases, applications are built on top of the databases that reference and make relational semantic inferences between the data in the database(s).  Today, blogs are the database(s). What is lacking are the applications that harvest the wealth of information stored in the blogs.  These are the applications that the next wave of Web 2.0 companies (including myself) are working on. 

The pace of these next generation applications is being hindered by the lack of a consistent structure (standard) in blog data. What Isikold is bringing attention to is that unlike relational databases, which adhere to relational database management system standard (characterized by a simple TABLE/COLUMN/ROW+SQL structure that has been consistent over the years), blogs have no such standard. The structure of blogs is currently left up to the blogging platforms such a WordPress, Typepad etc. Blogging standards today are akin to having Oracle, SQL Server, MySQL each using a different standard for storing and retrieving information. Not only a different a standard for each of the databases, but a different standard for each version of each database.  Exacerbating the problem further, each of the different databases being customizable by anyone and anyone can change the standard to a standard of their liking. If these databases were is such a state, it would be very difficult to write any applications that leverage data from these databases. ODBC and JDBC standards would be very unreliable, if not useless.  Such is the state of the blogosphere today when one looks at it from a data interface perspective.  

As many of you know, I am currently devoted to work on the layer of applications that leverages the data in blogs and beyond in order make such data more useful to users.  The lack of standardization (as described above) makes it difficult to identify the content in blogs.  Content identification is important because an application needs to be able to identify the difference between actual blog post text and some other text on the blog so that analyses and inferences can be established appropriately.  I have been monitoring the different types of templates in an attempt to predict template patterns for the different blogging platforms (mainly WordPress, TypePad, Blogger, MovableType).  I came to the conclusion that pattern prediction is only successful to a certain point due to the following

1) the original templates from the blogging platform vendor consists of multiple major and minor versions that do not have a predictable consistency in the template content tagging and

2) there are modified/hand coded templates floating out there which are totally unreliable.

As a result of these observations, I have resorted to writing my own content identification algorithms that include a combination of template pattern predictor algorithms and NLP based semantic blog post text identification algorithms.  While this has served me well up to now, a blog template standard will be very beneficial not only to myself but many people who have not figured out how get past the problem.  

Isikold is suggesting that a standard be adopted with the goal of giving blog templates a consistent structure.  This means the adoption of a template standard that identifies the different types of data on the different parts of bogs post. Isikold is suggesting that on a blog post, the template should make it easy to identify the blog post text, the side bar, the name of the author, the data that blog post was published, the tags for the blog post content and the blog posts comments.  I believe an adoption of this simple template will go a long way in helping to bring the next wave of Web 2.0 applications to market faster.  I support a blog template standard.

February 4, 2008 at 9:06 pm Leave a comment

Correct RSS date format

If you see a date like “01/02/07” in an RSS feed, what do you do?  You write a blog post about it. 

The applications that I am working on are reliant on some calculations using RSS dates.  I have noticed that the RSS date specification is probably the most taken for granted part of the RSS spec.  It is taken for granted because many consumers of RSS program around the date inconsistencies so there is not much of an outcry.  However, when you see a date like 01/02/07, you have to stop and say something. 

To those developers generating RSS feeds, please take a look at the RSS date format specifications as per the RSS specification.  I will summarize it here: 

The RSS date must conform to the RFC-822 (refer to the BNF for “date-time”  in section 5) date time format.  Examples of this format are: 

Wed, 04 Feb 2008 08:00:00 EST

Wed, 04 Feb 2008 13:00:00 GMT

Wed, 04 Feb 2008 15:00:00 +0200 

Do not just execute a stringifying method on your date object before writing it to the RSS feed.  Set the date format to the above mentioned format first before writing it to the RSS feed. 

To validate whether your date is correct, you can use

February 4, 2008 at 7:19 pm 2 comments

A LEGENDary tribute

This afternoon my wife and I visited the recently opened Denver Museum of Contemporary Art.  All the exhibits are great, however, there is one exhibit that consumed the majority of our time (and of other museum goers as well). No, it was not some complex hard to figure abstract art. It was the simple “Legend (a portrait of Bob Marley), 2005” by Candice Breitz.   

LegendHere is what Candice Breitz put together: In March 2005, 30 different people were filmed at the Gee Jam Studio in Port Antonio, Jamaica singing a capella (no instrumental accompaniment) of a compilation of Bob Marley songs.  All 30 shots are then played simultaneously on a 30 channel installation viewed through 30 different flat-screen TVs (one person per TV screen).  The coolest thing about this, is that even though it looks like one giant movie screen from afar,  you get a spatial effect of the sound (the sound comes directly from location of the person on the screen) . That is simply because there are 30 different TVs with speakers right next to the each screen, so the audio comes directly from the location of the TV screen.  I definitely sat there for more than 30 minutes (I could have watched all 62 minutes and 40 seconds of it) because, first I am a huge Marley fan and second it was fascinating watching these 30 individuals sing these legendary songs. They were not perfect singers, did not necessarily hold a tune and did not necessarily know the words to the songs.  However, I was captured by expressions; their facial expressions and their body expressions both when they knew the words and when they were clueless.  I loved the simplicity of the whole concept.  

This is a great exhibit to check out when you are in Denver, especially if you are a Marley fan. Be warned that this exhibit is pretty loud (which I think may annoy some people).  The voices of these 30 individuals echo through the whole museum.  If you are a Marley fan, it’s a great sound track while you check out the cool exhibits that they have at the Denver Museum of Contemporary Art.

January 27, 2008 at 5:07 am Leave a comment

Older Posts Newer Posts


  • Blogroll

  • Feeds