The title I chose pretty much sums up what Original Signal does.

On the face of it, it is one very purty website. No ads, the site displays headlines from the most influential blogs in each category, and even has a nice popup that only shows a a teaser of the content (even with a full feed). Their only source of monetization seems to be their search box on the top - using a Yahoo feed, the top links are sponsored. Some people may have problems with that (after all none of the content is ‘theirs’) but I do not really have a problem with that.

No siree - my problem is with their links.

The first link I saw was to TechCrunch, and as his habit of mine, I looked at the url. Instead of a simple direct link or even tracking link, its a full blow url: http://web20.originalsignal.com/article/4845/stalk-your-contact-list-with-upscoop.html

Why do they need such a URL? Hrmm I thought - maybe to prettify it. So I checked for a robots.txt file - nada. Nothing, zilch. This was odd - a full blown internal link, and no notification to robots that they should not be spidering the page.

Only one reasonable explanation, and one easy check: see how many pages Google has indexed for the site.

And there we have our beautiful back-stab. You find URLs like http://buzz.originalsignal.com/article/431824/acer-computer-pdoduct.html and http://movies.originalsignal.com/article/14405/carmen-loves-praying.html (among many others). Google clocks the site in with over 3,000 pages. I do not follow SEO news much now, but a while back there was a huge-stink about 302 Redirects. Basically sites were doing 302 redirects for outbound links (which to an end user got them to their destination) but confused search engines. When a site with a lot of trust/pagerank (ie Original Signal) did this, search engines would often times rank the offending site (ie Original Signal) and obliterate the original site with a dupe penalty.

And this is exactly what Original Signal is doing. They could link directly to the source (like popurls). If they really wanted to track clicks, all they would have had to do was link to something like out.php?id=xxx or /out/xxx. They could then block it from robots (so that spiders wouldn’t get confused) or use a proper 301 redirect. Nope - they instead chose to build a full url scheme. Users get absolutely nothing out it. Search engine spiders the pages.

Congratulations on helping pollute the web.