Data: Processing and more Processing

Pretty soon the mailman is going to be a pretty good friend of mine - every other day he seems to be bringing me something that needs my signature (usually a DVD/CD with more and more data).

What people often miss is that for maybe every record you see, there are 3-5 behind keeping everything aligned up. From timestamps on everything, to IPs on everything, to user IDs, reasons, sources, referrers, logs (a biggie) etc etc - it is of critical importance that everything be tracked.

A multitude of reasons too - from hack attempts, to backups, to bugs in the system - it is extremely important that we know (if we need to) what is going on at the microlevel. Just as important are macrolevel trends - important when keeping an eye out to make sure things are working fine.

In the case of iBegin Source, each state has six tables (the data dump portion is half of one table). These tables are all important to make sure everything is lined up and synchronized. This does not include any of the raw data (which spans untold amounts of gigs and tables) or even the dev area. Heck the search database has ~40k tables and 150 million entries - not the most effective, but it has reasons inherent in scaling purposes.

And to be sure - there is a ton of manual verification going on. From entries mis-spelt (cheap data-entry?) to condensing multiple entries to one - the business of data is not a fun one.

BUT - it is a double edged sword (in a good way). We sell the data. And we utilize the data. I was on the phone today with someone who was very interested in our approach - I keep telling everyone that we do local (and thus - all of our ‘products’ are us doing local). I think other data providers (in fields not even related to local) are going to start following this lead - instead of just being a enterprise provider, why not also click with the consumer directly? The recent move by Yahoo to take its mapping in-house is a perfect example - it is exactly what we did. Instead of relying on a provider, we are going to the raw sources and amalgamating on the data ourselves. This of course leaves the enterprise provider in a sickly situation - enter the consumer market, and piss off the enterprise customer. Don’t enter the consumer market, and hope the enterprise customers don’t leave you.

I’ll touch on this more in-depth later.

  • 4 Comments |

TechCrunch’s Arrington - Short short memory

I promise after this I will go out and criticize some sites :)

TechCrunch covers the ‘.CM scam’.

The choice quote:

This is actually one of the cleaner scams occurring in the extremely dirty domain name business.

and

And when money is thrown at these small countries, it seems that they have little hesitation in giving control of their namespace to a relatively unknown speculator.

Lest anyone forget - Pool.com:

Pool management team headed by President and CEO Michael Arrington

I remember seeing listings of TM domains in Pool’s upcoming list. I’m sure Pool picked up a lot of domains that had expired and had been previously websites. Was it not a ‘dirty’ domain business back then? Were domainers not ’speculators’ then?

I only say this because - I don’t think domainers are evil. I do admit I am jealous of what they do. I do admit some cross ethical and moral lines (eg registering typos or disaster domains for pure profit). But quite a few own some decent domains (eg I sold beat.com to the .cm ’scammer’ back in 2003). Anyway - I find it quite irksome when the grand crusader of ‘web 2.0′ was in charge of the largest domain catching service and now pretends like he had no relationship with it.

  • 0 Comments |

Google & FeedBurner: Double Whammy

So it seems like Google is acquiring FeedBurner for a smooth $100 million.

I like FeedBurner. They provide a useful function. Their support is fantastic. Even had the president email me when we had a few issues. The site exudes charm that is missing from most companies (including quite a few of ours, I will be honest).

So everyone has covered how Google is acquiring a very large audience. Just like Yahoo! locked in Flickr into its ad-network, Google is basically doing the same.

But while that is all obvious and what not, the other benefit is the ton of data that Google is getting (again - similar to what Yahoo did with MyBlogLog, but on a larger scale imo).

Google’s two big pushes in search have been relevance [removing spam] and personalization [give people what they want]. This is why products like Google Analytics and Google Reader haven’t been directly monetized - the entire point is to find out what people find useful and what they like. Google Analytics especially is scary - they know what pages people are visiting, what links they are clicking on, etc. Lets say someone is searching for ‘food’ and two pages come up with near identical relevancy internally. Using Analytics, Google knows Page 1 is on a domain people find more sticky, and that people spend 3x the time on Page 1, while also clicking on 2x the links. So what is better for the user? (both can be argued - Page 1 in terms of longevity of user on the page, Page 2 in terms of user coming back quicker). I’m still surprised Yahoo beat Google to the punch when it came to Del.icio.us.

So the acquisition of FeedBurner gives them a ton more information. They know clickrate. They know the subscription # of a lot of sites. They can find information/data they previously did not know existed (which I touched on briefly).

I talk about covering the A to Z process for our properties - Google has done the same. They know what end-users are doing (via Google Analytics). They know what end-users are reading and finding interesting (via Google Reader). And now they know how popular blogs are, and what people find interesting on those specific blogs (via Google FeedBurner). And don’t forget they can trick your ad habits now - not just via AdSense/Adwords, but also through DoubleClick now.

Scary.

  • 0 Comments |

Maps + Wiki

We’ve been gearing up to launch our own wiki-system for places soon, but have run into a few issues.

So I wanted the time to showcase two places:

Wikimapia - the grand-daddy of map/wiki sites. A ton of rich content. But unlike more traditional ‘wiki’ sites, everything is locked in. As of right now, the site claims 3,703,330 places.

ShapeWiki - A lot cleaner in approach, I really like this website. Has some fantastic export options. Especially intelligent is how they do their mapping - adding a point between two points allows for a lot more flexibility, and was a very intelligent design choice.

  • 2 Comments |

Interesting: Franchising Local

I came across Mini Cities late yesterday, and it is an interesting model.

The basic gist is simple. Properly covering a large area of local is difficult - the manpower required is quite intensive. So - why not build a stable platform/system that allows a person to quickly setup a website for his/her local area?

The definition of local seems to be amorphous here, as ‘New Tampa’ is not really a city per se, more of an area/locale (if you live there, you know what it is). As such, this means it is exclusively geared for local residents (about 25% of our traffic on iBegin Toronto are tourists).

The sites themselves are not bad. They have made the urls fully search engine readable (though the actual url structure is odd - eg seems to include ‘restaurants’ and ‘coupons’ in all of them). The standard items of local interest are here - events, business listings, coupons.

The ads (where I expect most of the revenue will come from) are all on-site links to locations with coupons. I think of this as a shrewd move - having people using these coupons with local businesses proves to the business that there is viable traffic and leads - much easier to sell ads later.

I do wish the sites were more customized design-wise. I did notice that the logos are unique, but otherwise the designs are identical. I think it would be a great idea if they had a dozen or so templates - franchises will want a bit more control over the design, and being locked into one standardized one isn’t that. The events also need a calendar - a listing is nice, but I (along with many others I am sure) prefer a calendar-view.

Regardless - the question at the end is how well the company will be able to sell franchises (and for how much). I am sure people are itching to try their hand at making money from a website, and the ability to do it locally seems very appealing to me. Time will tell if they have the right ingredients.

  • 0 Comments |

The iBegin Weather Widget: Powering it Up

I said I would come back to this.

We released iBegin Weather on May 10.

The site isn’t amazing. It isn’t ground-breaking. What it is is useful. It is clean, fast loading, and gets the damn point across (the weather) with a minimal of intrusions and confusion. I’ll do a comparison with other websites another day :)

So to me it is a form of evolution - taking data and presenting it in a more viable way.

So on the first day of operations we got 148 unique visitors. These were greatly driven by my own sites - from blogs like ForeverGeek and Blogging Pro, to our own iBegin Blog. The next few days it leveled off a bit (~125 unique visitors a day), after which it has gone up every day since. Yesterday it did 421 unique visitors.

I realized early that weather is a personal thing - people not only want to read it quickly, they want a quick way of getting it (be it RSS) or even displaying it on their site (widget).

So we spent quite a bit of time on the widget. We made sure it was fast (cached). We made sure it was customizable (a lot of options). And we made sure (again) it was simple - no flash interface, no heavy-graphics loading, none of that crap.

And then we promoted it. We emailed a few bloggers. We bought a few paid reviews (which led to a crappy experience). We did PPC.

In 9 days, I can only chalk this up as success. The widget is driving traffic and searches to the site. Yahoo reports 28 backlinks to the frontpage, but 1156 links to the entire domain. We have almost 150 sites using the widget (most of which don’t exist to Google). Our #1 referer is a high school in Kentucky!

This post is powerful. Stop for a moment and think about it. We found an enterprise provider of some essential data, we put up a clean website for it, we built an intelligent widget system that is both customizable and loads blazing fast, and then spending a little time emailing/paid reviews (~5 hours) and PPC (very little ongoing cost - I am stunned by how little widget promotion is done through PPC), we are building backlinks and traffic at a fantastic rate. The site almost has 10,000 pages in Google (in 9 days, no sitemap).

If you wanted a recipe for a site where you drive non-SE traffic while also driving SE-traffic, this is it.

  • 1 Comment |

Google: Not as expansive as we think

The iBegin Weather Widget is doing pretty damn good, already on 100+ pages (more on that for a future post). But what is amazing is that while a lot of new sites are using it (ie < 6 months old), but that Google doesn't know that most of them even exist (site:domain.com yields nothing). I'm talking about an eclectic mix - school pages, personal pages (both ISP hosted and university hosted), real estate pages, and even municipal pages (eg police department).

If in only 8 days I have come across dozens and dozens of pages that don’t ‘exist’ (in Google … or any of the other SEs for that matter) - how much of the web isn’t found? I’m not even talking about semi-confusing sites with frames and redirects - these are straight-up extremely simple HTML pages.

Perfect time to remind that unique content is not enough - promoting it is equally (if not more) important.

  • 3 Comments |

iBegin v3

Thought some of my readers would be interested in the upcoming iBegin v3.

  • 0 Comments |

Web 2.0 IM Sites: Screwed

I’ve been around for a while. I’ve dabbled in many many markets. And by far the worst market I worked in was games (the one except being virtual items/currency).

Game sites pull some horrible CPM rates. The reality is that it is a perfect storm of crap-traffic - skews young, skews ADD, skews impatient, skews banner-blind. This translates into poor, tons of pageviews, little clicks. Branding may work best here - if they notice the ads.

The problem becomes simple - so much bandwidth is sucked up that the amount of revenue generated by ads doesn’t cut it.

So I shake my head when I see all these IM sites popping up. Not only are they piggy-backing (in a back way) on other networks, the ad market simply won’t support something so spastic as IM. I was recently at Kool IM and all I saw were Google Adsense ads (where do you get contextual relevance?) and annoying flash-banner ads.

You want a recipe for disaster? Start up a web-IM company that piggybacks on AOL/Microsoft/Yahoo/Google’s networks.

  • 1 Comment |

Widgets: Middleman, not the end-all

Between the BlogFlux/BlogTopSites merger and a stomach bug, I haven’t been able to get much done.

And yet - while I wasn’t doing much directly, the system itself was working flawlessly. The best example: widgets.

Lots of talk these days about how awesome super fantastic widgets are. Yet very little of that talk has been spent on the downsides - how it is a JS call (and thus extra HTTP calls), how it basically hammers the widget-server, how it going down can make your website unusable (do you really trust the widgets you are using?), how you have no control over what data is acquired.

If the path from ‘Visitor’ to ‘Customer’ (ie makes you money) is from A to Z, the widget itself is somewhere in the middle.

Widgets are also useful in two ways:
1. Generating direct traffic to your website.
2. Generating backlinks to your website.

What I’ve done (with great success) is let #1 be optional, and let #2 be mandatory. The benefit in this way is that the webmaster (who will implement the widget) - he has the power to do what he wants. When it comes to compelling reason to using a widget - not having to link back (in any noticeable way) is way at the top.

  • 0 Comments |