.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
david stephan
Gaurav Pal
Post New Web Links

Crawl Rules

Posted By:      Posted Date: September 29, 2010    Points: 0   Category :SharePoint

I'm having difficulty understanding the crawl rules. In what type of situations would you add an include rule (unless you were including a part of a website before excluding the rest)? Does it even make sense to have include rules after exclude rules at all?

Let's say I want to crawl Google.com and have set it up as a content source. Is there any point in adding http://www.google.com/ as an include crawl rule?

View Complete Post

More Related Resource Links

Cannot set Host Distribution Rules with two Crawl databases, crawl component cannot be dismounted

When I try to set Host Distribution Rules I get the following error: Redistribution status: Failed - Crawl component GUID-crawl-5 on SERVERNAME cannot be dismounted. Check that the server is available. Farm Topology: 2 Frontend Server 2 Query Server with partitioned Index 2 Index Server with 2 crawl components with 2 crawl Databases 2 Application Server The 2 crawl components then stay in status Initializing, when retrying (only Option), I get the same error again I tried the following steps - Delete the Crawl Component that cannot be unmounted --> stops with error, Server cannot be contactet - Move the Crawl Component to Crawl Database 1 --> error - Reboot the Crawl Server - Take the crawl Server out of the Farm and rejoin, then delete the crawl component --> this worked, I could rebuild the Search Topology - Then try to set the Host Distribution Rules -> same error as in the beginning --> grr Any ideas?   

SharePoint search server 2010 crawl rules

My client wants to create a number of scopes by crawling specific subsites of a CMS 2.0 site.  The CMS site is crawled as a website and security is ignored (e.g. results are not security trimmed). As an example, they want to create a scope called “Audit”.  This scope will use a content source which crawls all content starting at http://server/services/audit and http://server/wssservices/audit. The first is the CMS 2.0 site, the second is a WSS 3.0 site that contains documents for the CMS site. I setup the content source with start address of http://server/services/audit and http://server/wssservices/audit with the crawl settings set to ‘only crawl within the server of each start address’. Additionally, I have created rule with path http://server/services/audit* and set the configuration to “Include all items in this path”, with “Crawl SharePoint content as http pages” also selected.  I have created rule with path http://server/wssservices/audit* with the same configuration settings, except  “Crawl SharePoint content as http pages” is not selected . I have also performed a full crawl after creating the content source and crawl rules. What I would expect to happen is that only results from http://server/services/audit or documents linked from http://server/wssservices/audit would show i

Input Validation: Enforcing Complex Business Data Rules with WPF


Windows Presentation Foundation has a rich data binding system that includes flexible support for business data validation. We take a look at implementing some complex data input validation scenarios that include customized data errors for users.

Brian Noyes

MSDN Magazine June 2010

Secure Habits: 8 Simple Rules For Developing More Secure Code


Never trust data, model threats against your code, and other good advice from a security expert.

Michael Howard

MSDN Magazine November 2006

Draft a Rich UI: Ground Rules for Building Enhanced Windows Forms Support into Your .NET App


In this article, the winning Windows Forms duo of Chris Sells and Michael Weinhardt team up again to explore lots of new features and additions to Windows Forms 2.0 that will let you build more flexible, feature-rich controls, get better resource management, more powerful data-binding abilities, and make your development life a whole lot more fun.

Michael Weinhardt and Chris Sells

MSDN Magazine May 2005

C++ Rules: Power Your App with the Programming Model and Compiler Optimizations of Visual C++


Many programmers think that C++ gets good performance because it generates native code, but even if your code is completely managed you'll still get superior performance. In Visual Studio 2005, the C++ syntax itself has been greatly improved to make it faster to write. In addition, a flexible language framework is provided for interacting with the common language runtime (CLR) to write high-performance programs. Read about it here.

Kang Su Gatlin

MSDN Magazine January 2005

Bugslayer: Three Vital FXCop Rules


In the June 2004 installment of the Bugslayer column, I introduced the amazing FxCop, which analyzes your . NET assemblies for errors and problems based on code that violates the . NET Design Guidelines.

John Robbins

MSDN Magazine September 2004

Web Services: Extend the ASP.NET WebMethod Framework with Business Rules Validation


In an earlier article the authors showed how to build a custom WebMethods extension that provides XML Schema validation, a function that is lacking in ASP.NET. In the process they established a foundation for enforcing business rules during the deserialization of XML data. The technique, which is described in this article, uses declarative XPath assertions to test business rule compliance.In building this business rules validation engine, the authors integrate the validation descriptions into the WSDL file that is automatically generated by the WebMethod infrastructure. Finally, they demonstrate how to extend wsdl.exe, the tool that generates WebMethod proxy/server code from WSDL files, to make use of their extensions.

Aaron Skonnard and Dan Sullivan

MSDN Magazine August 2003

Spider in .NET: Crawl Web Sites and Catalog Info to Any Data Store with ADO.NET and Visual Basic .NE


Visual Basic .NET comes loaded with features not available in previous versions, including a new threading model, custom class creation, and data streaming. Learn how to take advantage of these features with an application that is designed to extract information from Web pages for indexing purposes. This article also discusses basic database access, file I/O, extending classes for objects, and the use of opacity and transparency in forms.

Mark Gerlach

MSDN Magazine October 2002

BizTalk: Implement Design Patterns for Business Rules with Orchestration Designer


Because the value of good software planning and design should never be underestimated, it can be beneficial to use one of the many existing design patterns as a foundation for solving some of your toughest architecture problems. This article describes several traditional design patterns including the Observer pattern and the Dispatcher pattern, elaborates on their structures, what they're used for, and how they can help you build a BizTalk-based solution. Following this is a discussion on using the BizTalk Orchestration Designer to build designs and integrate existing business processes.

Christian Thilmany and Todd McKinney

MSDN Magazine October 2001

Can't start crawl becasue of index move operation


Hi there,

I can't start crawl task. The log says that "Deleted by the gatherer (The start address or content source that contained this item was deleted and hence this item was deleted.)" But I did not change the path of content sources and When I trie to start crawling job it says "Crawling might be paused because a backup or an index move operation is in progress. Are you sure you want to resume this crawl?"

What is index move operation? What should I do? I'll really appreciate the solution greatly. Thanks in advance.



"Content for this URL is excluded by the server because a no-index attribute." in crawl logs


Hi All,

I am getting following error message in Crawl Logs

" Content for this URL is excluded by the server because a no-index attribute. "

Any help in this regard will be greatly appreciated.



crawl stuck on "stopping"


Hi all

I have a problem wiht Moss 2007 search crawl. It was working fine, and suddenly it didn't show new content. I tried to trouble shoot, and saw that it had been running a crawl for more than 2000 hours. I stopped the crawl, and now it's stuck on "stopping".

I have googled and seen that a lot of people had that problem, and this might be because of maintenance job on the sql server (2005) with duplicated index values in the search database, or not having sp2 for sql. I checked, and we didn't have that problem.

Anybody here that has been on this problem, and fixed it? :)


Only crawl one site collection

Hi We have an intranet with about 100 site collections. How can I set up one of those to be in a separate content source that can be crawled more often? Do I need to make two content sources with one containing the other 99 site collections with the setting "Crawl only the SharePoint Site of each start address" and the other one containg my prioritized site collection with the same setting? I also would like to ask if the crawl rules have any effect on in which order the content is crawled. If I put a certain site to be included with order 1 will that site always be crawled first? //Niclas

Extracting Association Rules from Data Mining Model

Hi I have succesfully created a Data Mining Model using the Association Model. I have deployed and processed it. I need to extract all the rules that is generated by the model. I know that you normally query the model using DMX prediction queries, but in this case I need to extract the rules to a separate table for further processing. I have tried the following approaches unsuccessfully: 1. Linked server in MSSQL Management Studio. DMX query using OpenQuery. The DMX query looks like this: SELECT FLATTENED NODE_CAPTION, NODE_SUPPORT, NODE_PROBABILITY, MSOLAP_NODE_SCORE FROM DataMiningModel.CONTENT WHERE NODE_TYPE = 8   On small models this method works. On larger models with many rules I receive an exception: "XML for Analysis parser: The XML for Analysis request timed out" before it was completed This always happens after 70min. I might have missed a timeout option? 2. Using SSIS to run the same DMX query as above. The method is presented here: http://www.sqlservercentral.com/articles/MDX/64697/ This method works on small models. On larger models that did not work with method 1, it return rules. The number of rules returned is sometimes different for the same package run multiple times. For the largest models it simply returns 0 rules. I suspect that the same XML Parser error happens under the hood of SSIS. I'm currently stuck, and need some inp

SharePoint crawl errors on files which are not present

All, I'm noticing 2 errors in my crawl logs. Neither of the files exist anywhere on our site. The URLs are http://.../forms/repair.aspx and http://.../forms/combine.aspx and the error message is 'Error in the Microsoft Windows SharePoint Services Protocol Handler'. Our crawl normally takes about 3 and a half hours. Recently, it's been taking 5-6 hours. These 2 errors are logged at the end of the crawl. While the crawl is running, I see the success count growing and at about 3 and a half hours into the process, the success count stops growing. I'm not sure what the crawl is doing for the next 2 or so hours, but if finally logs the 2 errors mentioned earlier at the end of the crawl, then completes. I have tried resetting the crawled content and changing the index location of the SSP, but neither have worked.  I have also tried excluding the path to these two files with crawl rules, but that hasn't worked. I am on SharePoint 2007 SP2. Any ideas? Thanks

Cannot crawl sharepoint site and mysite after database attach upgrade form sharepoint 2007 to 2010.

After database attach upgrade site and mysite from sharepoint 2007 to 2010 , I have full crawl and get "The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly. If the repository was temporarily unavailable, an incremental crawl will fix this error. ( Error from SharePoint site: HttpStatusCode ServiceUnavailable The request failed with HTTP status 503: Service Unavailable. )" for mysite and get "Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled." for sharepoint site. The content access account for search is "db_owner" of both of site and mysite. How do I solved this problem ?
ASP.NetWindows Application  .NET Framework  C#  VB.Net  ADO.Net  
Sql Server  SharePoint  Silverlight  Others  All   

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend