June 17th, 2008
WordPress.com has added XML sitemaps so I thought I would take a glance at their implementation.
My immediate though was to take a look at Lorelle’s sitemap.xml
• Homepage daily priority
• Every other page updated on a weekly basis?
That seems like a good way to tell the spiders to index your site less often than they currently do.
With Lorelle you would certainly want spiders checking the home page hourly as she is sometimes the source of breaking news.
Then I looked at the sitemap with a little more detail, and in particular the entry for her most recent post, the Cyclical Nature of Blog Stats - a post worthy of a link anyway so this is a 2-in-1.
This entry was written by Lorelle VanFossen and posted on June 16, 2008 at 4:57 am
Ah but I know Lorelle writes posts sometimes in batches and schedules them for publishing. Lets look at the XML:

Wrong again - today is the 17th, Lorelle published a post on 16th June, which updated the home page, but it is not reflected in the sitemap.
Sometimes you might be better off with no sitemap at all…
5/10 for finally fulfilling a user request
1/10 for implementation (so far)
*Originally published at AndyBeard.eu
Comments
Posted by Andy Beard | No Comments »
May 22nd, 2008
What other sites do flash and other web 2.0 components trust, by Google search or Google hacking the crossdomain xml file, you can find out some very interesting things about what sites are trusted by another site, and where API’s or other trusted widgets can come from, including advertising.
The Google hack is here, crossdomain.xml site:.com or feed in extension .com, .net, .org etc of choice.
This is the crossdomain.xml file from twitter as an example
allow-access-from domain=”*.twitter.com”
allow-access-from domain=”*.discoveringradiance.com”
allow-access-from domain=”*.umusic.com”
allow-access-from domain=”*.hippo.com.au”
allow-access-from domain=”*.ediecareplan.com”
allow-access-from domain=”*.yourminis.com”
allow-access-from domain=”*.korelab.com”
Read the rest of this entry »
Posted by Dan Morrill | No Comments »
April 18th, 2008
A few days ago I needed to write some functionality to fetch an XML document from a URL and load it into an XmlDocument. As always I use the WebClient to retrieve simple documents over HTTP and it looked like this:
using (WebClient client = new WebClient())
{
string xml = client.DownloadString(“http://example.com/doc.xml”);
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
}
I ran the function and got this very informative XmlException message: Data at the root level is invalid. Line 1, position 1. I’ve seen this error before so I knew immediately what the problem was. The XML document that was retrieved from the web had three strange characters in the very beginning of the document. It looks like this:
<?xml version=”1.0″ encoding=”utf-8″?>
Read the rest of this entry »
Posted by Mads Kristensen | No Comments »
March 25th, 2008
All the major search engines (Google, Yahoo!, MSN/Live, and Ask) use the XML Sitemaps protocol for getting URLs from websites.
Of course they all still use good old-fashioned crawling, but the XML sitemap can be helpful for getting new content indexed quicker and also helping spot errors using other tools the search engines offer. Simply put, if you don’t have an XML sitemap already, we suggest you get one. XML-Sitemaps.com offers a free service that works very well, and you can also download (not free) a version to run on your own server.
Read the rest of this entry »
Posted by Michael Jensen | No Comments »
March 4th, 2008
All blog platforms send out pings using the XML-RPC protocol whenever a new post is created or an old one is updated. It is very simple to send out XML-RPC pings using C#, because it is just a normal HTTP request with some XML in the request body. See here how to ping using C#.
To send a ping is the client part of the transaction. There must also be a server to intercept the ping – an endpoint. Feedburner is probably the most widely used by blogs at the moment. What happens is that when you write a post, the blog engine sends a ping to Feedburner’s XML-RPC endpoint. In the XML body of the ping request is the URL of your blog so Feedburner knows who sent the ping. It then retrieves your blog’s RSS feed, parses it, and spits it out to your readers. That’s how simple and powerful it is to use XML-RPC pings.
Read the rest of this entry »
Posted by Mads Kristensen | No Comments »
February 7th, 2008
When you come across something interesting on the web, but don’t have time to read it at that moment, what do you do?The old way is to add the web page to your browser’s bookmarks or favourites so you can retrieve it when you do have time.
A more recent method is to add that page of interest to a social bookmarking site such as del.icio.us.
This has two additional advantages - you can tag the page with keywords so you can share your bookmarked content with others, and you can access your bookmarked content from any other computer or device with a connection to the internet.
Read the rest of this entry »
Posted by Neville Hobson | No Comments »
January 4th, 2008
I’m a big fan of the Google Webmaster Central Program and using sitemaps. I agree that you should build your website so that it is crawlable and not rely on sitemaps to compensate for poor site architecture, but hands down there is no better tool when you are migrating or cleaning up after a site migration than webmaster central. However there’s a dark side to webmaster central that I haven’t seen anyone else bring up.
First we need to dive a little deeper into webmaster central and sitemaps. One of the subtle features of webmaster central is the priority tag in the XML file.
You are allowed to specify a value from 0.1 (lowest) to 1.0 (highest) for your pages. Now some people try to “trick” the search engines giving all of their pages a 1.0 thinking this will result in a SEO benefit.
Read the rest of this entry »
Posted by Michael Gray | No Comments »
December 11th, 2007
This could be a milestone in illustrating the benefits of using XBRL for companies filing financial data.Last week, Microsoft submitted a Form 8-K filing to the US Securities and Exchange Commission (SEC) comprising financial data for shareholders, using XBRL.
Why is this significant?
First, it needs a bit of understanding as to what XBRL is. Simple explanation: it’s an emerging XML-based standard to define and exchange business and financial performance information. For more, read the technical description on Wikipedia.
Read the rest of this entry »
Posted by Neville Hobson | No Comments »
November 20th, 2007
Here is a mystery for folks. I’ve updated my parsing engine for coldfusionbloggers.org.
I’m using CFHTTP now so I can check Etag type stuff. I take the result text and save it to a file to be parsed by CFFEED.
But before I do that I check to ensure it’s valid XML. Here is where it gets weird. Charlie Griefer’s blog works with CFFEED directly, but isXML on the result returns false. But - I can xmlParse the string no problem. Simple example:
Read the rest of this entry »
Posted by Raymond Camden | No Comments »
November 20th, 2007
A few days ago I blogged about a code review I was doing for another client. Yesterday I found another interesting bug in their code. (It is always easier to find bugs in other people’s code.) Read the rest of this entry »
Posted by Raymond Camden | No Comments »