I read an excellent post on structured vs unstructured data in the local space.

The problem about local data is an impossible human problem. People think differently. What is beautiful to me could be ugly to you. What could be a kebab to me could be a skewer to you. A car could be a piece of trash, and so forth and so forth.

On a related blog post, there was a discussion on building a better database. I’m not sure what Yellowbot was doing there (they just use Localeze data), but I am glad they were.

The entire argument of using a tagging system as your ‘base’ is shortsighted. Mostly because (as I explained) - people don’t see things similar. My previous examples were more generic - it gets even more confusing at the local data level. Is it a ‘gas station’ or a ’service station?’ A ‘doctor’ or a ‘medical practitioner?’ And so forth and so forth.

We were doing tagging in local space before anyone else (over 18 months now). You can see that users have taken it upon themselves to tag. Yet the same user can use different words when tagging an identical business (’dry cleaners’ vs ‘laundry’ - even when they provide the exact same service).

Our team has been slogging through the categories used in iBegin Source for roughly the last month, and I’ve never come across a bigger headache. Our task was relatively simple - merge, rename, prune the categories so that they are simpler to user and more obvious. But the breadth of business listings is enormous. Even getting it to 10,000 categories is a task not for the feint hearted (talk about constant cross referencing to possible matching categories).

So - where do we end?

The core data needs structure. At iBegin we had originally attempted extremely loose categories - 8 in total, tagging to control the rest. Even that caused problems - what about the establishment that is a restaurant until 10 pm, and then exclusively a bar from 10 pm to 2 am? And tagging was great in two ways - it allowed users to participate in a simple way (adding a word or two is relatively trivial), and it improved our meta data (the most important quality in local search). Multiple categories (eg the place is both restaurant and a bar) + tagging = where you want to be.

So whats the conclusion?

Categories are needed from a top-down level in order to classify businesses properly. A user based system cannot work because too much freedom leads to a mess that cannot be properly organized (much less properly monetized). Tagging on top is a great way to build up a taxonomy - cheap meta-data creation that augments your core classification.