My clients engineering team has an internal Twiki site (running on a Linux/Apache configuration) that's been up and running for years. We recently installed SharePoint 2007 Enterprise for the client and are trying to configure
the SharePoint search to crawl the internal Twiki site.
FYI... Twiki, for those of you that don't know, is an open source Wiki solution. Read about the product here -->
http://twiki.org/
Anyway..... We setup a new website content source via the SharePoint CA that points at the Twiki site and kicked off the crawl. No problem there. The crawl fired up and went to work.
Here's the stumper. After over 24 hours, the crawl was still running. That's right, it ran for over 24 hours and still did not complete. Assuming that something was wrong, we went ahead and stopped
the crawl.
While the crawl was running, we were seeing the following behavior in the crawl log.
Successfully crawls about 1,600 plus pieces of content on the Twiki site Produces over 250,000 Warnings with this message --> "Content for this URL is excluded by the server because a no-index attribute". This is stunning. I'm not a Twiki expert, but I find it hard to believe there are really
250,000 pieces of c
View Complete Post