Posts filed under ‘REST’

Filtrbox vs. RSS readers/aggregators

One of the questions that I am often asked is how Filtrbox is different from traditional RSS readers and aggregators.  The following are the major differences:

Closed Search Domain vs. Open Search Domain

When using traditional RSS aggregators, the user supplies the list of RSS feeds. This means that the domain of information gathered by a traditional RSS reader/aggregator is limited to the RSS feeds that are known to the user.  I call this a closed search domain. However, in an environment such the one we have today where thousands of new content sources are being created on a daily basis and anyone can potentially become a publisher, it is unrealistic to put the burden on the user to keep up with the thousands of new content sources that are sprouting up each day.  Filtrbox takes this burdensome responsibility away from the user and discovers the new content sources for the user because Filtrbox’s search domain covers all the new content sources. I call this an open search domain. The user can also add RSS feeds to the search domain, thereby guaranteeing that their RSS feeds of interest are searched. This approach leads to the user discovering new content sources.

Publisher centric vs. Content centric

Traditional RSS readers/aggregators present to the user all the content that is published by a specific publisher regardless of whether the user is interested in the content or not. Thus, the traditional RSS readers/aggregators implement a publisher centric information consumption model. On the other hand, Filtrbox implements a content centric information consumption model.  Rather than deliver to the user all the content published by a specific publisher, whether its relevant or not, Filtrbox allows the user to filter for the content that they are interested in from ANY publisher by providing contextual keywords. The content centric model implemented by Filtrbox greatly reduces information overload because each piece of content is examined and filtered for contextual relevance before it is delivered to the user.

No filtering vs. Contextual relevance filtering

As indicated above, traditional RSS aggregators do not filter the content.  All content published by a publisher in the user’s closed search domain is delivered to the user regardless of whether it is relevant or not.  Filtrbox applies algorithms that filter content from an open search domain of publishers for contextual relevance.  Filtrbox uses multiple factors to determine the contextual relevance of content and assigns a score called FiltrRank.  The most important feature of the algorithm is that the contextual relevance algorithm learns from a Filtrbox user’s implicit interests and applies the implicit interest to future contextual relevance filtering. This means that the content delivered to the user is content that that specific user is interested in and not content other people are interested in.  Contextual relevance filtering plays a large part in the reduction of information overload.

Beyond RSS

Unlike traditional RSS readers/aggregators, Filtrbox consumes content delivery formats beyond RSS. Filtrbox is capable of consuming both standard and proprietary content delivery formats.

 

 

 

August 26, 2008 at 10:35 pm 1 comment

A case for standardizing blog templates

Alex Isikold of AdaptiveBlue has published a great post on “How YOU can make the web more structured”.  A section of this post, “Standardizing Blog Templates Across Platforms”, really resonates with me.  Isikold is suggesting that blogging platforms such as WordPress and TypePad standardize their templates.  Why is this important? 

To help answer this question, here is the Web 2.0 school of thought that I subscribe to:  Let’s start off with an enterprise database analogy. The basic assumption is that blogs are nothing but a data store.  While information in a blog makes for an interesting read, it is about as interesting as reading data in a text column in a relational database.  While the data in a single text column may have a lot of meaning, its meaning and usefulnes is enhanced when the data is combined with other columns in the same table in database, or with other tables in the same database, or even with data in other databases. The wealth of data is hidden in its interconnections with other data. In order to harvest the wealth of data in databases, applications are built on top of the databases that reference and make relational semantic inferences between the data in the database(s).  Today, blogs are the database(s). What is lacking are the applications that harvest the wealth of information stored in the blogs.  These are the applications that the next wave of Web 2.0 companies (including myself) are working on. 

The pace of these next generation applications is being hindered by the lack of a consistent structure (standard) in blog data. What Isikold is bringing attention to is that unlike relational databases, which adhere to relational database management system standard (characterized by a simple TABLE/COLUMN/ROW+SQL structure that has been consistent over the years), blogs have no such standard. The structure of blogs is currently left up to the blogging platforms such a WordPress, Typepad etc. Blogging standards today are akin to having Oracle, SQL Server, MySQL each using a different standard for storing and retrieving information. Not only a different a standard for each of the databases, but a different standard for each version of each database.  Exacerbating the problem further, each of the different databases being customizable by anyone and anyone can change the standard to a standard of their liking. If these databases were is such a state, it would be very difficult to write any applications that leverage data from these databases. ODBC and JDBC standards would be very unreliable, if not useless.  Such is the state of the blogosphere today when one looks at it from a data interface perspective.  

As many of you know, I am currently devoted to work on the layer of applications that leverages the data in blogs and beyond in order make such data more useful to users.  The lack of standardization (as described above) makes it difficult to identify the content in blogs.  Content identification is important because an application needs to be able to identify the difference between actual blog post text and some other text on the blog so that analyses and inferences can be established appropriately.  I have been monitoring the different types of templates in an attempt to predict template patterns for the different blogging platforms (mainly WordPress, TypePad, Blogger, MovableType).  I came to the conclusion that pattern prediction is only successful to a certain point due to the following

1) the original templates from the blogging platform vendor consists of multiple major and minor versions that do not have a predictable consistency in the template content tagging and

2) there are modified/hand coded templates floating out there which are totally unreliable.

As a result of these observations, I have resorted to writing my own content identification algorithms that include a combination of template pattern predictor algorithms and NLP based semantic blog post text identification algorithms.  While this has served me well up to now, a blog template standard will be very beneficial not only to myself but many people who have not figured out how get past the problem.  

Isikold is suggesting that a standard be adopted with the goal of giving blog templates a consistent structure.  This means the adoption of a template standard that identifies the different types of data on the different parts of bogs post. Isikold is suggesting that on a blog post, the template should make it easy to identify the blog post text, the side bar, the name of the author, the data that blog post was published, the tags for the blog post content and the blog posts comments.  I believe an adoption of this simple template will go a long way in helping to bring the next wave of Web 2.0 applications to market faster.  I support a blog template standard.

February 4, 2008 at 9:06 pm Leave a comment

Correct RSS date format

If you see a date like “01/02/07” in an RSS feed, what do you do?  You write a blog post about it. 

The applications that I am working on are reliant on some calculations using RSS dates.  I have noticed that the RSS date specification is probably the most taken for granted part of the RSS spec.  It is taken for granted because many consumers of RSS program around the date inconsistencies so there is not much of an outcry.  However, when you see a date like 01/02/07, you have to stop and say something. 

To those developers generating RSS feeds, please take a look at the RSS date format specifications as per the RSS specification.  I will summarize it here: 

The RSS date must conform to the RFC-822 (refer to the BNF for “date-time”  in section 5) date time format.  Examples of this format are: 

Wed, 04 Feb 2008 08:00:00 EST

Wed, 04 Feb 2008 13:00:00 GMT

Wed, 04 Feb 2008 15:00:00 +0200 

Do not just execute a stringifying method on your date object before writing it to the RSS feed.  Set the date format to the above mentioned format first before writing it to the RSS feed. 

To validate whether your date is correct, you can use http://feedvalidator.org

February 4, 2008 at 7:19 pm 2 comments


Calendar

August 2017
M T W T F S S
« Sep    
 123456
78910111213
14151617181920
21222324252627
28293031  

Posts by Month

Posts by Category