Wednesday, 20 December 2017

What is Web Scraping? How to do Web Scraping in C#?


What is Web Scraping?
             
       It is a technique to extract large amount of data from the websites. Data can be saved in Local system or in database tables.

This Scraping can be done manually but these days people are creating there own software to do the same task in few seconds or minutes.

If you have a website and you would like to get data you would extract the page by URL and parse the page by there elements, attributes or any selectors. and then saving then into your database or local drive.

How to implement Web Scraping in C#?

To start working on Web Scraping install package HtmlAgilityPack from Nuget like:-



If you visit the url :-

https://www.yellowpages.com/search?search_terms=software+develop&geo_location_terms=Sydney%2C+ND

I am trying to access the titles which circulated in the screenshot.




Use namespace HtmlAgilityPack.

using HtmlAgilityPack;



Once you run this code you would get the output like:-


Now you are good to start working on web scraping. Start diving into the namespace and classes and xpath more you would be able to fetch complex html easily.


No comments:

Post a Comment