Tuesday, February 21, 2012

RSS Search

Hi! I'm looking for ideas on what would the best approach to design a
search system for a RSS feeds. I will have some 50 RSS feeds (all RSS
2.0 compliant) stored locally on the web server. Now I'm wondering
what would the best method to allow searching of these RSS files.
Since the search will cater to multiple users the search system has to
be robust and efficient. Some ideas that I have for the RSS search
system are:
1. Store all RSS files locally on the web server file system and
perform file system queries. But I guess this might get slow when a
number of users try to search. Moreover, the queries may not be
extensible (for example to allow boolean operations etc).
2. Move the RSS data to the database and then search perform search
using LIKE (or the more advanced indexing service features).
3. Use a 3rd party full-text search engine like Lucene.
4. Use something like XQuery or XPath to query the RSS files directly
but this again *might* (not sure since I haven't worked with either)
get slow when a number of users try to search.
Also, the RSS files I have on the web server will be updated every
hour or so.
So, I have the ideas but I'm not quite sure which one would the most
suitable and efficient. If anyone has ideas on implementing such a
search system for RSS feeds then please share your insight. Thank you
guys!
You might want to shread the XML docs/RSS feeds and store them in a
relational database and FTI the columns of interest and query them there.
I would advise against storing them in the file system or storing them in
XML format in the image type columns. Although Indexing Services and SQL FTS
does support querying XML/RSS feeds using the XML iFilter, you can't index
properties using SQL Server FTS, and Indexing Services support isn't much
better.
You could index the XML/RSS as text but there are some problems indexing the
XML tags.
XQuery FTS support will be supplied when SQL 2005 which will RTM next year.
Please refer to:
http://msdn.microsoft.com/xml/defaul.../sql2k5xml.asp
for more info.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"RiceGuy" <9icj4u613jeqrx8@.jetable.org> wrote in message
news:d7851925.0407242144.30e9c55f@.posting.google.c om...
> Hi! I'm looking for ideas on what would the best approach to design a
> search system for a RSS feeds. I will have some 50 RSS feeds (all RSS
> 2.0 compliant) stored locally on the web server. Now I'm wondering
> what would the best method to allow searching of these RSS files.
> Since the search will cater to multiple users the search system has to
> be robust and efficient. Some ideas that I have for the RSS search
> system are:
> 1. Store all RSS files locally on the web server file system and
> perform file system queries. But I guess this might get slow when a
> number of users try to search. Moreover, the queries may not be
> extensible (for example to allow boolean operations etc).
> 2. Move the RSS data to the database and then search perform search
> using LIKE (or the more advanced indexing service features).
> 3. Use a 3rd party full-text search engine like Lucene.
> 4. Use something like XQuery or XPath to query the RSS files directly
> but this again *might* (not sure since I haven't worked with either)
> get slow when a number of users try to search.
> Also, the RSS files I have on the web server will be updated every
> hour or so.
> So, I have the ideas but I'm not quite sure which one would the most
> suitable and efficient. If anyone has ideas on implementing such a
> search system for RSS feeds then please share your insight. Thank you
> guys!
|||You may want xquery friendly database like sql2005. You have xml typed
field. e.g.
create table table1 (i int, x xml)
Now you can directly xquery the xml field like,
select i, x from table1 where x.exist(xquery)
Free download is,
http://lab.msdn.microsoft.com/express/sql/
"RiceGuy" <9icj4u613jeqrx8@.jetable.org> wrote in message
news:d7851925.0407242144.30e9c55f@.posting.google.c om...
> Hi! I'm looking for ideas on what would the best approach to design a
> search system for a RSS feeds. I will have some 50 RSS feeds (all RSS
> 2.0 compliant) stored locally on the web server. Now I'm wondering
> what would the best method to allow searching of these RSS files.
> Since the search will cater to multiple users the search system has to
> be robust and efficient. Some ideas that I have for the RSS search
> system are:
> 1. Store all RSS files locally on the web server file system and
> perform file system queries. But I guess this might get slow when a
> number of users try to search. Moreover, the queries may not be
> extensible (for example to allow boolean operations etc).
> 2. Move the RSS data to the database and then search perform search
> using LIKE (or the more advanced indexing service features).
> 3. Use a 3rd party full-text search engine like Lucene.
> 4. Use something like XQuery or XPath to query the RSS files directly
> but this again *might* (not sure since I haven't worked with either)
> get slow when a number of users try to search.
> Also, the RSS files I have on the web server will be updated every
> hour or so.
> So, I have the ideas but I'm not quite sure which one would the most
> suitable and efficient. If anyone has ideas on implementing such a
> search system for RSS feeds then please share your insight. Thank you
> guys!
|||RiceGuy,
In addition to what Hilary has recommend, the following web article
"Creating SQL Based RSS Feed ..." at http://www.sswug.org/see/18299 defines
a sample table and data along with a stored proc "GenerateRssFeed" and then
use:
Execute the below sql script to generate the RSS feed.
sp_makewebtask @.outputfile = 'C:Rss.xml', -- Point 1
@.query = 'Exec GenerateRssFeed', -- Put the SP name here
@.templatefile = 'C:RssFeedTemplate.xml' -- Point 2
The article is good and directly explains how RSS feed can be generated
directly from SQL Server 2000. The above web article can also be found at
http://www.dotnetforce.com/(0eqeob4525xs2h55fmagg3zz)/Content.aspx?t=a&n=204
If you're interested in the XQuery support (but not the FTS component), you
might want to review the newly released beta version of SQL Sever 2005
Express, the MSDE 2000 replacement at:
http://lab.msdn.microsoft.com/express/sql/default.aspx
Regards,
John
"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:uSVlTCkcEHA.4092@.TK2MSFTNGP11.phx.gbl...
> You might want to shread the XML docs/RSS feeds and store them in a
> relational database and FTI the columns of interest and query them there.
> I would advise against storing them in the file system or storing them in
> XML format in the image type columns. Although Indexing Services and SQL
FTS
> does support querying XML/RSS feeds using the XML iFilter, you can't index
> properties using SQL Server FTS, and Indexing Services support isn't much
> better.
> You could index the XML/RSS as text but there are some problems indexing
the
> XML tags.
> XQuery FTS support will be supplied when SQL 2005 which will RTM next
year.
> Please refer to:
>
http://msdn.microsoft.com/xml/defaul.../sql2k5xml.asp
> for more info.
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "RiceGuy" <9icj4u613jeqrx8@.jetable.org> wrote in message
> news:d7851925.0407242144.30e9c55f@.posting.google.c om...
>

No comments:

Post a Comment