Sunday, September 12, 2004 12:12 AM bart

Screenscraping my "number of ASP.NET posts"

Ever wondered how I get the number of my ASP.NET Forums posts on my homepage? The answer is by using screenscraping and the use of regular expressions. Here's the code:

<%@ OutputCache Duration="30" VaryByParam="none" %>
<%@ Control Language="C#" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<%@ Import Namespace="System.IO" %>
<%@ Import Namespace="System.Net" %>

<script runat="server">
 private string URL = "";

 public void Page_Load(object sender, System.EventArgs e)
   WebClient clnt = new WebClient();
   Stream s = clnt.OpenRead(URL);
   StreamReader r = new StreamReader(s);
   string res = r.ReadToEnd();
   Regex regex = new Regex("contributed to ((.|\n)*?) out of", RegexOptions.IgnoreCase);
   Match oM = regex.Match(res);
   lblPosts.Text = oM.Groups[1].ToString().Replace(",","");
   lblPosts.Text = "unable to retrieve";

<asp:Label id="lblPosts" runat="server" />

Pretty simple, isn't it? However, don't forget to cache the whole thing (this is the code of an .ascx, so it causes "partial page caching" of the homepage). A try...catch block should appear in teh code as well to incorporate the possible events of "scraped site down" or "scraped site redesigned".

Filed under:


