With big sites like Blog Flux, when you deliver a service, you better have it ready for scaling.
We cache every single pagerank request to our Pagerank Checker. Obvious reason - PR updates rarely, and when it does, we can just flush the cache. In the meantime, if we keep asking Google the pagerank every time we get a request, we end up with 1) slower response time and 2) higher chance of being banned by Google.
I ran the math - we have a total of 33 gigs cached on our server. At an average size of 225 bytes per PR image (0-10), that comes out to almost 150,000,000 different URLs checked for pagerank by us! This is only a count on unique URLs - since the last pagerank reset we have delivered over 10 billion images.
I don’t remember the exact date, but this thread puts the last PR update at January 9. We have had roughly 90 days elapse since then. Number of images we’ve served up:
Average response time is roughly 0.10 seconds.
For the domains we have PR cached for, the most popular first letter was ’s’, followed by ‘m’ (way behind). The least popular was ‘q’, with ‘z’ about 2x more popular. ‘Q’ was roughly 2.4% as popular as ’s’
Easy explanation: Its indexing is faster and smarter
Case in point: tracking spider hits for a PR7 site with 1+ million pages:

Just compare those numbers. Yahoo! has over 5000 links to the index page - and yet it can’t even crawl 100 pages in a day?
That’s only half the story. How about the intelligence of the crawling (the following image is hits to the index page by the search engine spiders):

Yahoo! and MSN are obsessed with the frontpage - Google checks it out a few times a day.
Google’s spider isn’t just better - its far smarter.
Update: At the end of the day, Googlebot clocked in 68,162 hits, Yahoo 37 (5 to the index page), and MSN 4.
Update 2: Just broke the 100k mark.
Note: This is part of a ‘three-setter’ on internet morals and what not:
While very mis-understood (and hated by a lot), the domain industry is on fire. With big sales like Vodka.com for $3 million, and with people like Howard Schultz (CEO of Starbucks) investing in companies investing in the domain space, you can only imagine it is going to grow up.
Before you roll your eyes - while they will never admit it publicly, Yahoo! execs have unofficially admitted that domain names generate roughly 10-15% of their revenue. That is - billions.
One of the lead proponents of domains and how super amazing is Rick Schwartz, the self claimed ‘domain king’ (and also ‘webfather’, which has to be the most retarded and inaccurate nickname ever).
Anyhoo - Rick has a very boisterous attitude. The best way to sum up his outlook is ‘you are either with us or against us’ - there is no middle way.
He is also a pretty shrewd businessman. He helped found the TRAFFIC conference, dedicated to domain names and how super-awesome they are. He also sold Men.com for a purported $1.3 million.
Rick also operates the TRAFFIC forum, a forum for established domain owners. We are talking about some big guns here, people making millions a year. One of the members include Frank Schilling (very smart person, his blog is an excellent read. Consider him the anti-Rick).
I was a part of that forum. Keyword ‘was’. Recently there was a kerfuffle when a board member posted on some other forum as ‘domainking’. Ever diligent about his ‘TM’ (more on that later) of ‘domainking’, Rick did what he does best: go ballistic. While I don’t want to delve into the soap opera that ensued, he not only banned the offending person, but also Donna Mahony, who was just trying to mediate some calm. There is a difference in being a ‘forum administrator’ and a ‘tyrant’, and alas our good friend had moved into the tyrant phase. A member called for a walkout, and I participated.
Now I want to make a moment to pause - a walkout is when you leave something in protest. It was an institutional standard during the Vietnam War, but alas nowadays it has morphed into an ‘evil act.’
We were subsequently banned (at the end of the day, nine of us were removed).
The ensuing ruckus (I was told) was quite hilarious. Lots of capitalized words, lots of !!!!!!!! - you know the drill (if you’ve ever talked to a preteen girl).
Anyhoo (I was setting the background) - Rick considers himself to be a domain pioneer, not only in the domain names he has, but also in ‘domainer rights’. I do want to say that I believe if you bought Green.com (and there is no TM company named Green) you damn well have the rights to it. A recent case was MSG.com being reverse-hijacked by some company with deep pockets. It is stealing.
Regardless - the moral hypocrisy: voyuer.com. Voyuer.com, a typo for voyeur, was bought for an astounding $112,100. Our good friend Rick owns the domain voyeur. To say he was angry was to understate the obvious, nevermind the fact that it was a typo of a generic word (and that voyeur.com itself was nothing more than a parked page. He did what anyone would do - file suit, claiming he had a TM on that word.
The self-proclaimed ‘Domain King’, who ‘fights for the rights of ‘domainers’, tries to reverse hijack the domain. From the panel:
The hurdles for showing that a generic term like “voyeur” has acquired a secondary meaning are high – Complainant has not cleared even the first hurdle.
Nothing more than a greedy plan. What really irks me is the entire process - while Rick could afford the couple thousand it would cost him to file the UDRP (process of resolving domain name disputes) and so could the respondant, what if the respondent was someone who couldn’t afford it? A valuable domain, with minimal hassle.
This all ties into my previous post on moral relativism. The moment someone starts imposing their ‘morals’ on someone else is when things start to fall apart. A lot of these people are willing to smile at you while they try bleeding you dry.
After all, what do you when you have a lot of money? Make more of it.
UPDATE: As is often the case with written text, some people are getting the wrong idea. This isn’t about Rick in person (he seems to be an okay guy, just has some problems with temper and criticism). This is an issue about morals - the moment you let someone else dictate what is moral and what isn’t (only a few things in life are truly black and white), that is when you run into trouble. Rick was simply an easy example - while talking about domainer rights, he tried to do what he ‘crusades’ against. Be independent please - be your own judge (ie while I respect Frank and Dean a lot, they aren’t my source of right & wrong).
Note: This is part of a ‘three-setter’ on internet morals and what not:
I don’t like telling what our bloggers write about. I do point out mistakes (including logical ones), but by-the-by, they can talk about wherever the wind takes them. That is, except for politics. Talk about a big no-no (do it on your personal blog, not on a company blog)
So I was a little … taken aback when I read about investing in anti-terrorism mutual funds. It wasn’t the general distaste - the idea of investing in such a fund (and the subsequent post about buying into theft insurance) are actually quite smart.
What I wasn’t really happy with was the half-preachy principal. The US government sends people to Syria to be tortured. US companies with large government contracts do business regularly with Iran. If you want to make a change, my opinion is you should fix those holes first. That didn’t settle with me, so I was compelled to post.
I wasn’t happy with the response. Equating not liking someone (which I never even mentioned) to hating something is an old, tired, and cliched argument. And for those that still don’t get it, let me outline it:
The opposite of love is not hate, it is indifference
I won’t go any further than that, except to close with this: you reap what you sow.
Note: This is part of a ‘three-setter’ on internet morals and what not:
I saw this: is it time for a blogging code of conduct? and I just shook my head. Thankfully, one of my favorite bloggers spoketh on this blogging code of conduct.
I posted the following comment on his site, and it sums up my feelings:
Sigh … “unwritten rules of blogging” - what unwritten rules?
The moment someone tries imposing their moral beliefs (nicely called ‘code of conduct’) is the moment they can go fuck themselves.
Sorry for the language, but this holier-than-thou attitude some people carry around like a nightstick gets annoying. Much less the reality that anyone who does send out a death threat won’t stop and go “oh wait a minute, these are against the code of conduct. Shit, can’t send this now!”
There are two things at play here:
1. The ‘abuse’ of internet anonymity is well known. Penny-Arcade covered this perfectly. People will act like jackasses online. I’ve been called every single racial epitaph every known to man (and more). But here is the thing - if someone does throw out a death threat, that is a matter for the police. It isn’t a matter for bloggers. It isn’t the job of ‘A-list’ bloggers. It doesn’t matter what medium the threat was posted on. Unless of course you are on TV, at which death threats are passe (I’m looking at the radical wings of both ‘mainstream’ parties here in the US).
2. The imposition of beliefs on others that do not want it are getting exhausting. The death that I have already covered above. But other than that - bugger off. If what I say does not harm you, and you don’t like what I’m saying - get lost. Why do people have this craving to whine about anything and everything that may offend them. People will use any opening to push their (moral) agenda. Just like the ‘Patriot Act’ was used to push some pretty un-patriotic things (more about love & hate in the next post). I remember getting a call from some group trying to raise donations for the ‘Anti-pedophile bill’. When I asked details about what it actually did, the woman got offended! How dare I ask - can’t I see the title? This hysteria and reaction-driven pushes for morality have got to stop.
I haven’t even gotten into the whole mess that is the internet vs country-specific laws.
Human rights are about protecting the minority that don’t have the majority to push whatever they want.
Synergy was an awesome buzzword. Not only does it mean something, it just sounds awesome. If you had synergy, you couldn’t go wrong.
People have often asked me - is there something wrong with me? Do I have ADD? (not to imply ADD is wrong, but you get the idea). Why can’t I just *focus* on one thing and go with it?
The answer is simple: having resources in various areas makes it much easier to ‘push’ a product.
Case in point: Blog Flux Local
While not launched, the entire project is a very daunting task - we essentially want to catalog local content, and geocode it to where it belongs. Similar to outside.in, but really - more simplistic.
So one of our initial problems was - how do we figure out where a post is about? We can attempt to parse out street intersections etc, but that is haphazard. We can ask people for GPS coordinates (FeedBurner supports this) - but who the hell is gonna figure that out?
The simple truth is that we associate places with names (or even street intersections). I would say “McDonald’s near Elm and Queen Street”. I wouldn’t say ‘131 Elm Street’ or ‘23.2352, -115,234234′ Now to be able to do something like that, we need both the business data and the geocoder.
And so now in comes iBegin Source and iBegin Geocoder (launching soon). We already have support for linkage on iBegin Source - basically you link to that specific page, and we link back (right now you have to manually add the link, but we are working on a trackback system for that). Example: Best Vet Inc in Boynton Beach, FL.
We know the post is about XXX, we know that XXX is located in YYY - so now we know where all of this is.
The next challenge then is to introduce bloggers into this system. And that is where Blog Flux’s fantastic reach comes into play. Almost 31,000 blogs approved, and over 72,000 registered users. Throw in Blog Top Sites with another 30,000 members (50% overlap with Blog Flux), and you now have the potential to reach 87,000 users about this service (by the time we launch it should be 90,000). Blog Flux is going from strength to strength (just peaked at 45,000 pageviews a few days ago) - this will just push it further along
I’m not going into more details about how we are presenting the data and so forth (aha!), but this should give an idea on how having multiple established brands can be a good thing. Do remember that both iBegin and Blog Flux have their own staff, so it’s not like you can just setup two brands and enjoy. It takes time to do that too.
I just wanted to point this out …
I keep seeing those damn ads where they talk about how they spend 10000 engineering hours to improve the best-in-class-awesome-towing-capacity-insane-fuel-efficiency truck.
But is it all that impressive?
Based on 40 hours a week, the average employee (working at 50 weeks a year) works for 2000 hours. So when it comes to 10000 hours, really you had five employees work on this.
When you have thousands of employees, five just isn’t impressive.
We get all sorts of legal threats and what not.
The latest one (in a long line of bizarre ones) comes from ISPhost.org. Yet another web hosting company out there (with what has to be one of the ugliest designs out there) sent us a nice little letter threatening to sue us. It was boilerplate: We are from XX, how dare you track us and attack our servers, cease or we will sue.’
A few things that really bug me about this:
Just another case of over-reaction (and I know full well both what DDOSes feel like and what running a server infrastructure entails).
The newest local search site has arrived, and this time its Canadian: ZipLocal
My standard practice of welcoming a new site is to usually point out about a dozen mistakes they have commited (usually over at Greg Sterling’s blog). But things change, and this time I have my own blog - huzzah!
Furthermore, in terms of disclosure, the investing company behind ZipLocal was interested in our iBegin product. I met them. But I’m reviewing these guys because - well, I review all local sites.
So, things that ZipLocal needs to fix up:
Just because of the Wal-mart debacle, I give it a D-
Whew - that turned out to be a long one. They should be paying me for this ![]()
We operate a plethora of sites. Some are updated a dozen times a day, some once a year.
The sites that generate the most amazing ROI are usually the ones that are most targeted. For example, we have a rarely updated site based loosely on finance. Today it generated a pithy 28 pageviews. The earnings for today: $18.18.
While that was a super-freak earnings there, the site averages a CPM above $100 (both this month and lifetime - it has been around since early 2005).
Don’t pass on something just because it doesn’t have the potential for 10 million pageviews. You can get by with 1000 sometimes.