.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
david stephan
Gaurav Pal
Post New Web Links

Web Spider/ Crawler

Posted By:      Posted Date: August 26, 2010    Points: 0   Category :ASP.Net

hello everybody,

i'm trying to develop a WebSpider that retreives data related to "Sports" from twitter, Facebook and other sites/blogs, to diplay it all in my page.


i just need to retreive  the information diplayed on that page, but the problem i'm facing is that when i'm reading the twitter page, there is only a javascript code that display these data and not the data itself


is there any possbility to extract this information si can log it in a Database for exemple ?

Thank you for your usual help  




View Complete Post

More Related Resource Links

Spider in .NET: Crawl Web Sites and Catalog Info to Any Data Store with ADO.NET and Visual Basic .NE


Visual Basic .NET comes loaded with features not available in previous versions, including a new threading model, custom class creation, and data streaming. Learn how to take advantage of these features with an application that is designed to extract information from Web pages for indexing purposes. This article also discusses basic database access, file I/O, extending classes for objects, and the use of opacity and transparency in forms.

Mark Gerlach

MSDN Magazine October 2002

Crawler fails to register date properties of user profiles with the month of January, April, August


This seems to be a bug when the crawler search the user profiles in MOSS 2007.  When crawled, user profiles with a SPS-HireDate in the months of January, April, August and December will be detected, but a full-text (SQL) search returns those profiles without the HireDate field.

User profiles with HireDates in other months work correctly, returning the HireDate in the search.  And changing the month of a problematic user profile also fixes the problem.

This problem is also reflected in the fact that while we have 499 user profiles using the SPS-HireDate property,  the managed property page from the search section only has 350 items with the HireDate property.

We're running MOSS 2007 32bit with SP2 with an English language base and the Spanish language pack. I'd considered date format problems, but I can't imagine how some months would work, while others wouldn't.

Any ideas?

MOSS Crawler Issues

Today I noticed, the scheduled crawls are not running. I can manually start incremental crawls on the content sources and it's working fine. When the time comes for a scheduled crawl the job never runs, but surprisingly the time for the next scheduled crawl adjusts to the next proper time. I have crawled (incremental) manually on 4:49 PM today and indexed item count was 53287. Till now some more issues are created into my Application and after the time mentioned  automatic incremental crawling is performed 3 times (5:00 PM, 5:30PM and 6:00PM) but unfortunately the indexed item count is same i.e. 53287. Surprisingly next crawling time has updated to next scheduled time i.e. 6:30PM but crawl logs showing last crawling time is 4:49PM.why automatic incremental crawler is not working but adjust the next crawling time time?Please help me to solve this problem.THanks! Arup R(MCTS) SucCeSS DoEs NOT MatTer.

Spider Charts?

Hi,I want to make spider charts like this one:http://www.nevron.com/Gallery.ChartForActiveX.RadarChartsGallery.aspxI want to know can I prepare it using reporting service 2008? If not, please suggest some other good tool. I am using sql server 2008 as the database and working with Visual studio 2008.Regards,ap.

Search crawler 'breaks' Document ID link

Hi all - we have a Sharepoint 2010 development server up and running (very much Out of the Box configuration). The Document ID functionality has been enabled and appears to be working OK - each document has its own unique ID as predicted. In a team site (Corporate Comms), we have a document library where our corporate logo has been loaded (a small .bmp file). This image file has a unique Document ID. On the top level site, I have created a link to the corporate logo so that it appears on the 'entry' web page. The link to the image is : http://<servername>/_layouts/DocIdRedir.aspx?ID=CH07-7-1  where CH07-7-1 is the Document ID When first created, everything works OK and the image appears on the front page. However, once a search crawl has been run (either full or incremental), the image disappears and is replaced by a red cross (i.e. the URL link has been broken). I have tried this with a word document link (as opposed to a .bmp image) and this still works OK. I have also tried storing the image in a picture library (as opposed to a document library) and the same thing happens. Is there something funny happening specific to the .bmp image/Document ID/crawler ? Anything I need to configure ? Any help would be appreciated. Thanks, Richard.    


Hi i am storing some web stats into a table, i have noticed that Request.Browser.Crawler does not catch all crawlers, so i am getting incorrect visitors. what is the best way to make sure that only valid web browsers are stored?

SharePoint 2010 Search Crawler excluding items

Hi All I'm trying to setup the Search functionality in SharePoint Server 2010, but I'm having trouble getting it working. I have set up a crawler for the site with a user that has Full Read permissions to everything. I also set up a crawl rule for the site including all items in the path with the "Follow links on the URL without crawl itself" option checked. The crawler runs across the entire Site Collection and sub-sites, but I keep seeing warnings that content was excluded and the search can return no results. The error message I keep seeing is "The content for this address was excluded by the crawler because this item was marked with a no-index meta-tag. To index this item, remove the meta-tag and recrawl." I have looked for a no-index meta-tag, and have been unable to find it anywhere. I have also done searching on this issue, and set site settings to Always index all WebParts and to Appear in Search Results. Neither of those settings appear to have helped.   Does anyone know what I am doing wrong, or know of good documentation that could point me in the right direction? Thanks

what is sharepoint profile crawler?How it works?

what is  sharepoint profile crawler?How it works?

Search returns nothing, crawler works fine(i believe) in MOSS 2007



i did some research in the forum before posting, but i can't seem to find a real solution for my problem, even though there are many threads discussing similar or maybe even the same problem. I will describe the problem in short, and then if someone can help me out and needs more info i can give more details etc.

So, i am new in my enterprise, and also to "real" informatics (recently graduated master 2) and i am supposed to administer the internal IT of the company. The SharePoint expert of my company just changed job, so i was kinda hired to replace him. The problem is, i have limited experience with SharePoint, just had my 70-630 yesterday but i am no expert. 

The problem with our SharePoint is that the search won't work(doh!). The crawler is working fine, we get things indexed etc, but then searches return nothing. I tried playing with scopes but as far as i can see, it's not that. What's weird is that apparently the search worked fine at some point in the past, then just mysteriously stopped. I discussed with the ex-admin and we tried to fix it, but we couldn't get it working.


I am kinda stuck, don't know where to look and which settings i need to check, on what le

Crawler only indexes 1,000,002


I have setup an external data type pointing to a SQL Database with just over 8 million records.  I need share point to be able to search these records so I setup a crawler but it only indexes 1,000,002 it does report 1 error (which happens near the beginning of indexing) but that error won't display when going through the errors screen.  The crawler has indexed data as I do get some search results back.  Is there a limit on the amount of data a crawler will process? 

Do apologize is this is obvious but I am only just learning sharepoint.

Search: How to add a new language to languages detected by crawler?


Dear all,


I think my question is not so well formulated, so let me explain:

In the advanced search web part, it possible to add a filter on the language of the results. By default the list of language proposed to end-users is English, French, German and some others.

When editing the properties of that web part, it is possible to modify that list to propose different languages (in "Languages" element of the xml). BUT you can only choose from the languages previously defined ("LangDefs" element).


What I would like to do is to add/install a new language definition, because the language I am interested in is not listed under "LangDefs".

Obviously, just editing the XML won't do the trick.


From what I understood, language of documents in automatically detected by IFilter when crawling documents.

So, do I need a new IFilter for my specific language?

I justed installed the language pack for SharePoint, so maybe it also includes the new IFilter?


And when correct crawler is installed, where do I find the LangDef information (including custom id for the language, that is not standard language id)?


I hope someone here is able to help me.

Many thanks in advance!




I want to write code that when Who are Searched words in engines search, If words were similar Contents of My website  come to my website, Like CRAWLER

But, don't know how should write it?

Please, every body knows, assist me

Crawler could not communicate with server


I'm moving our SharePoint Server 2007 32-bit  to a 64-bit  on Windows Server 2008, using this white paper


 I have successfully joined 64-bit Sharepoint server on Win serv. 2008 to existing farm.

My current farm topology:

Exchange 2003 - mail server

SQL 2005 - database server

MOSS2007 32bit on Win2003 server 32bit- web/app server

MOSS2007 64bit on Win2008 server 64bit - web/app server

I've successfully stop Office sharepoint server search service on 32bit machine and start on 64bit machine, but when i try to start a full crawl, only mysite and SSP content is crawled, not rest of the portal content.

Crawl log: http://moss   The crawler could not communicate with the server. Check that the server is available and that the firewall acc

SharePoint 2010 Crawler runs for days



I've installed FAST Search for SharePoint 2010 on Windows 2008 R2 with SQL 2008 R2 as database server.  When we try to crawl on the file share content by using SharePoint's crawler, it takes very long time to complete.  Even for a folder having a small number of files (<10), it will take 10 hours to complete.

Any idea?


Sharepoint Crawler and Basic Authentication issue



We have a MOSS site which have basic authentication enabled, everything is working fine except sharepoint search.

We have investigated and found that if basic authentication is implemented then sharepoint crawler cannot access the site and wont work.

is there any way to resolve this issue.

Our MOSS environment :

MOSS 2007 SP1.

Windows 2008 server.



Jasjeet Singh

Crawler Diskspace


Hi everybody,

under the "Search Administration" i can see, that the "Server status xxxxx : D: 746GB" is 746GB. What exactly does the 746GB mean? Its funny, because our Content is between 80-100GB.

Any Ideas?

Crawler not working after switching url from http to https


The crawler on the top right corner of the page in the header area is not working.  Search box drop down has "This site: Intranet" and type-in search box to the right of it.  It returns 0 results for any search.  In admin consoles application setting search area the url was changed and the site was recrawled, no change. 

Now on the welcome page there is a advanced search web part which works correctly.  In site actions I set the search scope for the site.

How can I get "This site: intranet" search to crawl the site correctly?  Is there something else that needs to be changed since the site went from http to https?

Thanks in advance for any assistance.

ASP.NetWindows Application  .NET Framework  C#  VB.Net  ADO.Net  
Sql Server  SharePoint  Silverlight  Others  All   

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend