GeoSign or eMedia?

I was looking over Restaurantica’s About page when I noticed that it was owned by ‘eMedia’. I had previously met the owner of Restaurantica, and (at the time) was an employee of GeoSign. I heard a few months later it was sold to GeoSign itself. The about page also says that eMedia owns TrueLocal, GolfCourses.com, etc

Color me confused - what is going on here? eMedia.com has nothing on it, and the GeoSign page shows business as usual.

Update 1:

From http://www.linkedin.com/in/vfilby:

eMedia is a publishing company that took over many of the marquee domains held by Geosign. Our goal was to make TrueLocal more competitive with other local search players by improving performance and moving towards a social Web 2.0 ideal.

I’m not sure how taking over marquee domains (eg GolfCourses.com and Hockey.com) has to do with local search?

Update 2:
Sort of an on-going discovery of mine, it seems like TrueLocal also uses lucene. Same thing that Yelp uses.

Update 3:
An interesting oddity - Google blogsearch. You can see the second result is dated from Oct 3 from rmay.ca. The link itself is dead (http://www.rmay.ca/16/) but Google cache has it:

Now that the news of our rebranding is public, I now work for Moxy Media as Manager, Special Projects. It’s a little sad to see the Geosign name go away after six years of hard work helping build it, but similarly nice to have a fresh moniker to stand behind as we move forward in a series of new directions. If you have me in your address book please update my e-mail address as it is now rmay [at] moxymedia.com.

And lo and behold: Moxy Media - a red-version of the GeoSign homepage.

So I can only conclude that GeoSign has split into two - Moxy Media (’content’ sites) and eMedia (100k or so domains + TrueLocal).

Update 4:
At the same time, this job posting says:

Emedia (a Geosign company) is an internet media company, poised to launch sites such as Hockey.com and Golfcourses.com

So - if eMedia (a subsidiary of GeoSign) is doing content publishing (ala Hockey.com) - what exactly does Moxy Media do then?

  • 2 Comments |

118 and Judy’s Book?

Thought this was sort of odd …

I saw Sebastien’s post on 118.com launching in the US. I was doing some random searches, and I noticed that their ‘default’ website image seems to be from Judy’s Book.

Eg: food (both #1 and #10 have that as their icon).

Odd.

  • 2 Comments |

So I was perusing over some stats for iBegin Source, and saw that the last person who had downloaded our data had come via ‘local business data

Looking in the results, I noticed a post on Webmaster World titled “Good Source For Local Business Data?

Lo and behold, it redirected to this page, which had been posted roughly 24 hours ago. And lo and behold, someone had mentioned iBegin.

This was both amazing and frightening. Amazing that not only had Google indexed it so fast, but there were now other people mentioning iBegin. But frightening too - there was no easy way for me to know iBegin had been mentioned. This was especially crucial as tennis_fan28 was slightly incorrect - it wasn’t 50k for the full US, but 40k (not that big of a deal, but accidental mis-information). It wasn’t picked up on blogs. There was no link for me to find it on referrals. BoardTracker (imo the best bulletin board search engine) missed it by a mile. The only thing that would have caught it would have been Google’s advanced search option (where you can specify the date-range of when something was first found). Unfortunately this has two problems: 1) it finds a lot of junk/redundant stuff (eg anything on the ibegin.com domain new to Google) and 2) it only works for *new* pages - a forum thread started a while ago but with a new mention of iBegin would pass through.

Anyway - what eventually happened was I posted in two separate threads where iBegin was mentioned, and the next day the threads were gone. Turned out they had been flagged for review - and I don’t blame them, it did seem very convenient. The posts were restored the next day - anyone try to crawl Superpages.com? and Good Source For Local Business Data?

  • 6 Comments |

I read an excellent post on structured vs unstructured data in the local space.

The problem about local data is an impossible human problem. People think differently. What is beautiful to me could be ugly to you. What could be a kebab to me could be a skewer to you. A car could be a piece of trash, and so forth and so forth.

On a related blog post, there was a discussion on building a better database. I’m not sure what Yellowbot was doing there (they just use Localeze data), but I am glad they were.

The entire argument of using a tagging system as your ‘base’ is shortsighted. Mostly because (as I explained) - people don’t see things similar. My previous examples were more generic - it gets even more confusing at the local data level. Is it a ‘gas station’ or a ’service station?’ A ‘doctor’ or a ‘medical practitioner?’ And so forth and so forth.

We were doing tagging in local space before anyone else (over 18 months now). You can see that users have taken it upon themselves to tag. Yet the same user can use different words when tagging an identical business (’dry cleaners’ vs ‘laundry’ - even when they provide the exact same service).

Our team has been slogging through the categories used in iBegin Source for roughly the last month, and I’ve never come across a bigger headache. Our task was relatively simple - merge, rename, prune the categories so that they are simpler to user and more obvious. But the breadth of business listings is enormous. Even getting it to 10,000 categories is a task not for the feint hearted (talk about constant cross referencing to possible matching categories).

So - where do we end?

The core data needs structure. At iBegin we had originally attempted extremely loose categories - 8 in total, tagging to control the rest. Even that caused problems - what about the establishment that is a restaurant until 10 pm, and then exclusively a bar from 10 pm to 2 am? And tagging was great in two ways - it allowed users to participate in a simple way (adding a word or two is relatively trivial), and it improved our meta data (the most important quality in local search). Multiple categories (eg the place is both restaurant and a bar) + tagging = where you want to be.

So whats the conclusion?

Categories are needed from a top-down level in order to classify businesses properly. A user based system cannot work because too much freedom leads to a mess that cannot be properly organized (much less properly monetized). Tagging on top is a great way to build up a taxonomy - cheap meta-data creation that augments your core classification.

  • 12 Comments |

Local - Two sides to every coin.

I read Himmelstein on G’s Local Biz Referral Program with interest. I find Marty’s musings very thought-provoking and much more deeper than 99% of local-search talk out there.

But I also feel the need to disagree (to a certain level) - some things touted as positives have a negative side to it.

First off, Google’s Business Referral Program is rather cunning - pay $10 to get a business hooked onto Google. Utilize college students (ie people with lots of free time and bad at valuing time vs money).

First off - $10 is a pittance. Think of the customer acquisition costs the cellphone companies go through. Heck, web hosting companies are paying $65 comissions for web hosting accounts that pay $10 a month. The program itself is skewed - you only get $2 when a businss referral is approved, and then $8 when the business itself verifies the information. All this trumpeting of ‘Google pays for referrals’, and the fine print shows that it ain’t no cake walk.

Next up - decentralization. Not a good idea imo. You need structure and organization when dealing with businesses. Imagine you are a popular pizza joint that serves a lot of college students. Suddenly you have a dozen students a day trying to get you signed up with Google. Is that going to make you happy or annoyed? And think of the college student - he goes to thei pizza joint, asks about Google Local, and the business owner angrily replies “I’ve already been asked”. Ants (to use that analogy) are so successful because they work in a very organized and intelligent manner. You can’t just unleash people with zero direction and expect to have everyone happy.

Continiung on - structured content. This is where I will admit that if anyone can understand data, it would be Google. But different people have different viewpoints on the same thing. Back to the dis-organization - without a succinct focus on what is acceptable (and what isn’t), you can be diluged with data you can’t deal with. With data providers, even categorization is a huge headache - identical businesses want themselves categorized differently. Throw in business-specific data and you have a huge headache dealing with it properly.

Further along - completeness. The demographic everyone talks about reaching to are students. Last time I checked (and I was a Student less than three years ago) - students are damn poor. Which means the areas they frequently operate in is relatively small. Furthermore, places with students are usually pretty well covered - what about places that don’t have an active student population? Even excluding students, the last mile is far easier to deal with around a university campus than most other places.

Lastly - updates. If there is anything difficult about business data, it is keeping it updated. What matters is what the businesses who do sign up with Google do in a year - do they remain involved, or don’t bother?

I don’t claim to have the solution to the problem of connecting offline businesses online. I also think Google’s plan could be a lot better.

  • 0 Comments |

The thing about local blogs is … unless you live in a city, you likely don’t know it exists.

The only reason I know of Gothamist LLC was because of their Toronto blog - Torontoist.

So with that in mind, I’ve been amazed at how established some local blogs are.

While reading about Bloggers Bring in the Big Bucks (to be honest, some of them were small-fry), I saw that Gothamist was mentioned. It was slightly confusing - the revenue says ‘monthly average of $50,000 to $60,000 over the past 12 months’, yet the first line says ‘estimated monthly revenues of $250,000′ But - with an estimated 7,000,000 pageviews a month (I don’t think even Yelp manages that), $250,000 a month sounds right.

Or to rephrase it - a local blog network that covers 14 cities generates roughly $3,000,000 a year. Hell, this would be a smart buy for a Yellow Pages company or a media company.

Next we have b5media acquiring Level9. Level9’s most popular blogs are their Starked blogs - which cover NYC, SF, LA, and DC. While the blogs veer into a broad category (eg NYC covers media, LA covers Hollywood, etc), the blogs still publish local content.

As an aside, JLA Ventures, the people behind Zip Local, are the investors behind b5media.

Lastly, we have Mediabistro.com purchased by Jupitermedia (the guys behind internet.com) for a cool $23 million. One of their most well known blogs are the Fishbowl blogs - covering NY, LA, and DC. Except they focus on media matters rooted around each city. And with only 600,000 unique visitors last year - they must know how to monetize the sites well.

So - three local blog networks, all kicking ass and taking names. Gothamist’s numbers are impressive, and Mediabistro got a nice buyout. How come we never hear about them from local analysts?

  • 2 Comments |

Please Stop Delivering The Phone Book To My House

The traditional Yellow Pages companies depend on the number of books distributed to charge their advertisers the maximum possible. As such, it creates an (obvious) incentive to keep distribution numbers as high as possible.

It has been quite common to talk about the real distribution number vs stated number - it is a very common scene to see stacks of unused YP books. By the dozens. Everywhere.

So I had a hearty laugh when I saw that this very complaint was on the frontpage of Reddit. While we can all agree that Reddit is far more tech-oriented than the average user, it still underlines how people have moved en-masse from the yellow pages to online. And in most cases, these tech-users (early adopters) set the tone for the future - be it from video games to computers to the internet, these people are a harbringer of the future.

There has been other anecdotal evidence that the YP companies are not lowering their ad rates. So - with (obvious) declining readership (ie, actual users) - when will ad rates reconcile with true distribution?

The YP companies are basically playing with a poker face - the moment any of them break down and start lowering rates / revising distribution numbers, the rest will be dragged along, kicking and screaming. Alas, none of them have had the guts (yet) to accept reality.

  • 3 Comments |

And here comes is My Home

A few days late, but better late than never.

Today Bloggy Network announced the launch of is My Home. A mini-network blog, focused around cities.

Launched in nine cities, we should be in at least a dozen by month’s end.

This all ties into iBegin’s increasing reach into the local space. While not part of iBegin itself (iBegin and Bloggy Network are two separate companies), you can be sure they will be working together.

  • 3 Comments |

GeoData (and creating it)

I have a post over on the iBegin Blog about messy geodata. I’ve also included a little peak into something iBegin should be rolling out soon.

  • 0 Comments |

Local Search - what powers who

There are a few headaches with local. While my current headache is as ambiguous as you can get, the persistent headache has always been search.

Search isn’t fun. Providing relevance for what a user is looking for is a painful task really - you take what a user searches, try to build relationships between what they put in to whatever categorization you have, factor in other variables (user rating, user favorites, distance from end user, etc), and try to throw out the most relevant results.

Not only that, but you have to scale it.

Simple search solutions break down once you hit 100,000+ unique records. More advanced search solutions start to groan under 1,000,000+ records. And at 10,000,000+ records, you have to be smart about it. The ‘where’ isn’t a simple text matching issue (which MySQL can handle for you) - there are multiple variables that are unique in each search. So pre-caching popular searches is also a dead-end solution.

Then you want to add in extra features. Metaphones and stemming, while relatively trivial (in basic implementation), become a bit of a headache when dealing with a large number of records.

When it is all said and done, generating our search engine (one time) takes about 72-96 hours. This is the new version - the current one on iBegin Source is quite slow - the new one should clock in at roughly 5x-10x faster. And it should finally be live sometimes next week.

I’ve always wondered what our competition uses.

A while ago I accidentally stumbled upon Judy’s Book search partner - Launch 21. The blog is quite detailed about the process behind Coupon Looker - interesting stuff.

So today I was intrigued when I saw Yelp’s Jobs page (with the title ‘About us’ for some reason). The first job posting was for a Senior Software Engineer - and the word that caught my eye was Lucene

I’ve spent a lot of time researching and understanding larger search systems, like Lucene, Xapian, Sphinx Search, and others. All are quite nice (and pack power), but molding an existing search system for local search isn’t an easy nor fun task.

I’m curious what other companies are doing (too many to name) - any clue? Are they custom (like us), outsourced/custom (like JB’s Coupon Looker), or an existing system modified (like Yelp)?

  • 2 Comments |