.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
david stephan
Gaurav Pal
Post New Web Links

Cannot crawl complex URL's without setting a site-wide rule to 'crawl as http content'. Help!

Posted By:      Posted Date: September 10, 2010    Points: 0   Category :SharePoint
I have pages within a site that use a query string to provide dynamic data to the user (http://<site>/pages/example.aspx?id=1). I can get the content source to index these dynamic pages only if I create a rule which sets the root site (http://<site>/*) to 'include complex urls' and 'crawl sharepoint content as http content'. This is NOT acceptable as changing the crawling protocol from SharePoint's to HTTP will prevent any metadata from being collected on the indexed items. The managed metadata feature is a critical component to our SharePoint applications. To dispel any wondering of whether or not this is simply a configuration error on my part refer to http://social.technet.microsoft.com/Forums/en-US/sharepointsearch/thread/4ff26b26-84ab-4f5f-a14a-48ab7ec121d5 . The issue mentioned is my exact problem but the solution is unusable as I mentioned before. Keep in mind this is for an external publishing site and my search scope is being trimmed using content classes to only include documents/pages (STS_List_850 and STS_ListItem_DocumentLibrary). Creating a new web site content source and adding it to my scope presents 2 problems: duplicate content in scope and no content class defining it that I know of. What options do I have?

View Complete Post

More Related Resource Links

"Content for this URL is excluded by the server because a no-index attribute." in crawl logs


Hi All,

I am getting following error message in Crawl Logs

" Content for this URL is excluded by the server because a no-index attribute. "

Any help in this regard will be greatly appreciated.



Only crawl one site collection

Hi We have an intranet with about 100 site collections. How can I set up one of those to be in a separate content source that can be crawled more often? Do I need to make two content sources with one containing the other 99 site collections with the setting "Crawl only the SharePoint Site of each start address" and the other one containg my prioritized site collection with the same setting? I also would like to ask if the crawl rules have any effect on in which order the content is crawled. If I put a certain site to be included with order 1 will that site always be crawled first? //Niclas

Cannot crawl sharepoint site and mysite after database attach upgrade form sharepoint 2007 to 2010.

After database attach upgrade site and mysite from sharepoint 2007 to 2010 , I have full crawl and get "The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly. If the repository was temporarily unavailable, an incremental crawl will fix this error. ( Error from SharePoint site: HttpStatusCode ServiceUnavailable The request failed with HTTP status 503: Service Unavailable. )" for mysite and get "Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled." for sharepoint site. The content access account for search is "db_owner" of both of site and mysite. How do I solved this problem ?

An unrecognized HTTP response was received when attempting to crawl this item

I have just done a dbattach upgrade of our servers and so far everything has come up very nicely. Except for the Search service. I cannot get Search to crawl our 4 web applications. The crawl finishes with errors everytime. I get the following error. "An unrecognized HTTP response was received when attempting to crawl this item. Verify whether the item can be accessed using your browser. ( Error from SharePoint site: HttpStatusCode GatewayTimeout The remote server returned an error: (504) Gateway Timeout. )" Does anybody recognize this. Server configuration: WFE: windows server 2008 R2, Sharepoint 2010 enterprise, SSL(wildcard certificate) DB: windows server 2008 R2, SQL Server 2008 R2 Things i've already tried: Used another browser Set disableloopbackcheck to 1 iisreset reset index modified the hosts file verified DB account extended timeout settings turned off "warn on ssl errors" Any help would be greatly appreciated. Need to go live in a couple of days. Cheers

Can you set a crawl rule to restrict crawling at a specific depth?

Say we have a start address http://contoso.com/depth1/depth2/depth3/ and we only want to crawl from depth3 and beyond (depth4+ is fine). Is this possible to configure with a crawl rule?

Crawl of TWiki site will not complete and produces Warnings

My clients engineering team has an internal Twiki site (running on a Linux/Apache configuration) that's been up and running for years.   We recently installed SharePoint 2007 Enterprise for the client and are trying to configure the SharePoint search to crawl the internal Twiki site.   FYI... Twiki, for those of you that don't know, is an open source Wiki solution.  Read about the product here --> http://twiki.org/ Anyway..... We setup a new website content source via the SharePoint CA that points at the Twiki site and kicked off the crawl.   No problem there.  The crawl fired up and went to work. Here's the stumper.  After over 24 hours, the crawl was still running.  That's right, it ran for over 24 hours and still did not complete.  Assuming that something was wrong, we went ahead and stopped the crawl.   While the crawl was running, we were seeing the following behavior in the crawl log. Successfully crawls about 1,600 plus pieces of content on the Twiki site Produces over 250,000 Warnings with this message --> "Content for this URL is excluded by the server because a no-index attribute".  This is stunning.   I'm not a Twiki expert, but I find it hard to believe there are really 250,000 pieces of c

Unable to crawl the content form any of sites inside WebApplication


   I created two new web application, both the web application contain one site collection and inside contain site. In one of the web application search is working and another one Search is not working, am try to crawl the content from the website but it shows Zero item. But in other site collection it show the crawl item. Am in confused why it happed. Why am unable to crawl the content from the sites

FAST Search Connector won't crawl my Content Sources


Can anyone help me figure out why I am not able to crawl the Content Sources for the FAST Search Connector?

The error from the ULS viewer is:

Failed to connect to 1sv-sp2010.wirestone.internal:13391 Failed to initialize session with document engine: Unable to resolve Contentdistributor

I followed the install steps found at http://technet.microsoft.com/en-us/library/ff381267.aspx, including the post install validation.  FAST seems to be working in every way except the crawl.

The port number 13391 was found in Install_Info document. "Content Distributors (for GUI SSA creation):          1sv-sp2010.wirestone.internal:13391"



External Content Types + Search Service: Cannot crawl my external content type



I created an external content type by creating a new Visual Studio sharepoint project, and creating a content type (The default Entity1 content type). I created a profile page for it and everything, and when I drilled into the content type in central admin - BCS, I saw it wasn't marked as crawlable.

I saw this similar post: http://social.msdn.microsoft.com/forums/en-us/sharepoint2010general/thread/281BCEFD-59EC-41CC-B948-458A4BDA9E49

So I then created an external content type through SPD, leveraging the same code, and creating an external list and profile page. This time, when I drilled into the external content type in the BCS administration, it showed "Crawlable: Yes".

I figured at that point I was good to go, but when I went to my search service application -> Content Sources -> New Content Source and selected Line of Business Data, and selected BDC, it still says "No external data sources to choose from."

I verified also that the account for crawling has permissions for the external content type.

Are there any other things I should be looking for? From everything I read this should "just work" now :)



Content crawls fail after additional crawl component is added.


I have 2 vm sp servers in my farm connected to a fast box on dedicated hardware and noticed that the content crawls have been kind of sluggish. I added another crawl componet to my web front end and did the whole FAST cert import but when I try and run a crawl it fails and gives the top level error:  Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled.

I made sure that the search services are running under the under the same service account and that the service account has full read access to my web apps. I haven't been able to find too much documentation regarding mutliple crawl components so I figured I post out here.

Search Topology:
1 Admin Component
2 Crawl Components (APP Server / WFE Server)
1 Admin DB
1 Crawl DB

Perhaps I'm going about improving my crawl performance the wrong way, if that's the case any suggestions would be greatly appreciated.

http 400 Error when performing item upload after setting custom edit form on content type



What I am trying to do is create custom edit forms for different content types within the same document library.....so basically I have done the following:

  • I have created two site content types and reconfigured my library to use CT's. 
  • I then connected to my site using SPD. 
  • When examining my library I can see that I have the two CT that I added earlier.  I then created a new editform for one the CTs using the top ribbon menu and in the drop down box associated it to the CT.  I did not select this as the default form for the library.
  • I then updated the library CT (not site CT) with the url to form in the editform properties.

When I create a document and edit the properties it works fine - totally as expected.  However when I upload a document the default EditForm for the library is used and I can select the CT to associate the document to.  When I choose the CT with the custom form the new edit form is not found and I get a page not found error - http 400.

The full url for this library is is http://XXXX/functions/ehs/CSI

So the URL I have used is /functions/ehs/CSI/Forms/EditForm2.aspx (EditForm2.aspx) is my form and this is fine.  I have even used CSI/Forms/EditForm2.aspx and this works...  BUT ne

Access denied when Searc Service Application tries to crawl Sharepoint content


I have just set up a new SP2010 environtment(3 servers: WFE, App, SQL).

When I try to get my Search Service Application to crawl my main SP site and my MySite location, I get the following error in the crawl log:

"Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled."

Things I have checked:
I have ensured that the default access account has "Full Read" on the web application
-I set up crawl rules for both sources specifying a service account that has admin access to the content on those sites
-I logged in to the SP sites using the service account that the Service Application is using to crawl
-I even created a brand new search service application from scratch and got the exact same results

The only difference between this environment and my test environment, where search works just fine, is that this is the production environment and so it uses FQDN with a host header: http://portal.company.org.


SP2010 SSL Crawl Fails - Accessing this site requires a client certificate.


Full error Message:  Accessing this site requires a client certificate. Specify a client certificate in the crawl rules.

I added the Crawl Rule and Content Source for the test site https://sp.xxxx.com/TestSite and tried specifying each available client certificates in the rules one by one, but the crawl still fails.





What kind of certificate does it needs and how to enable it? I know that the *.xxxx.com certificate is used for our SSL.


Exclusion Crawl Rule Not Working (MOSS2007)


SharePoint experts,

I'm trying to exclude all the contents within one of the document libraries (called NoCrawlLibrary) from being crawled and showed up in a search result, but it doesn't seem to work correctly.


MOSS 2007 SP2 (Enterprise) on Server 2003 Enterprise SP2 (32-bit); and SQL Server 2005 Enterprise SP3.


I have set up two SSPs (say, SSP1 and SSP2); and a site collection and a search center for each.  The Default Content Access Account has access to both SSPs.  I have created a document library called "NoCrawlLibrary" in each site collection that I intend to have my search engine exclude from displaying items found within the library.

On SSP1, the Crawl Rules are as follow:

- https://ssp1.moss:1001/ssp/admin/* Exclude 1

- https://ssp1.moss:1001/NoCrawlLibrary/* Exclude 2

- https://ssp1.moss:1001/* Include domain\svc_crawler 3

On SSP2, the Crawl Rules are very

Search issues for with a php site not being able Crawl


Hi everyone,

I have setup a  content source of a php site but i am not able to crawl it. I have also added a php file type in the file tyes section

my crawl rule has two rules

1. http://*/*asdf*

2 http://*.*


i am able to crawl other sites but just not the php ones


can anyone please point me in the right direction





Do I need a full crawl if I add a new web in a site collection?

Hello, Do I need a full crawl if I add a new web in a site collection? Is an incremantal crawl enough? Thanks

Recreated SSP "Content sources and crawl schedules" Error?


This is a follow up question to

Okay, I created a new SSP with a different name. Ran fine. However, when I clicked "Content sources and crawl schedules", I got this error:

Could not find stored procedure 'dbo.proc_MSS_GetCrawlHistory'.   at Microsoft.SharePoint.Portal.Search.Admin.Pages.SearchAdminPageBase.ErrorHandler(Object sender, EventArgs e)
   at Microsoft.SharePoint.Portal.Search.Admin.Pages.SearchSSPAdminPageBase.OnError(EventArgs e)
   at System.Web.UI.Page.HandleError(Exception e)
   at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
   at System.Web.UI.Page.ProcessRequest(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
   at System.Web.UI.Page.ProcessRequest()
   at System.Web.UI.Page.ProcessRequestWithNoAssert(HttpContext context)
   at System.Web.UI.Page.ProcessRequest(HttpContext context)

ASP.NetWindows Application  .NET Framework  C#  VB.Net  ADO.Net  
Sql Server  SharePoint  Silverlight  Others  All   

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend