Posts Tagged ‘HTML’

C#: Retrieve data from webpage

September 29th, 2008

So I came across a fun assignment this week that I’m sure has been done by many different people in many different programming languages. The challenge was to “scrape” a website for information autonomously and save it off to a file.

I accomplished this by first using a wrapper class for .NET’s own HTTPWebRequest object that simplified posting to a web site and retrieving the result. I then used regular expressions to find the data I wanted, stored it in a string, and later wrote it to a file.

I’m not going provide the specific program I wrote as it’s still proprietary, but I will give a small example of how this can be done. The example will include: posting to a website, retrieving the results (HTML for the page), and parsing the resulting page to find what you want.

The class I used to post to the site was done by Robert May and can be found here:

Here is an example of using this class to perform a search at CraigsList under the ‘for sale’ category and retrieving the results:

// Create the post object
PostSubmitter post =
    new PostSubmitter("");
// Add our parameters
    "Ford Truck"
// Specify our action type (Post | Get)
post.Type = PostSubmitter.PostTypeEnum.Get;
// Retrieve the results
string result = post.Post();

» Read more: C#: Retrieve data from webpage