Local Search - what powers who

There are a few headaches with local. While my current headache is as ambiguous as you can get, the persistent headache has always been search.

Search isn’t fun. Providing relevance for what a user is looking for is a painful task really - you take what a user searches, try to build relationships between what they put in to whatever categorization you have, factor in other variables (user rating, user favorites, distance from end user, etc), and try to throw out the most relevant results.

Not only that, but you have to scale it.

Simple search solutions break down once you hit 100,000+ unique records. More advanced search solutions start to groan under 1,000,000+ records. And at 10,000,000+ records, you have to be smart about it. The ‘where’ isn’t a simple text matching issue (which MySQL can handle for you) - there are multiple variables that are unique in each search. So pre-caching popular searches is also a dead-end solution.

Then you want to add in extra features. Metaphones and stemming, while relatively trivial (in basic implementation), become a bit of a headache when dealing with a large number of records.

When it is all said and done, generating our search engine (one time) takes about 72-96 hours. This is the new version - the current one on iBegin Source is quite slow - the new one should clock in at roughly 5x-10x faster. And it should finally be live sometimes next week.

I’ve always wondered what our competition uses.

A while ago I accidentally stumbled upon Judy’s Book search partner - Launch 21. The blog is quite detailed about the process behind Coupon Looker - interesting stuff.

So today I was intrigued when I saw Yelp’s Jobs page (with the title ‘About us’ for some reason). The first job posting was for a Senior Software Engineer - and the word that caught my eye was Lucene

I’ve spent a lot of time researching and understanding larger search systems, like Lucene, Xapian, Sphinx Search, and others. All are quite nice (and pack power), but molding an existing search system for local search isn’t an easy nor fun task.

I’m curious what other companies are doing (too many to name) - any clue? Are they custom (like us), outsourced/custom (like JB’s Coupon Looker), or an existing system modified (like Yelp)?

  • 2 Comments |

Backfence: Another HyperLocal site bites the dust

Read this first on The Local Onliner, BackFence, one of the more visible hyperlocal sites has bit the dust.

BackFence almost feels like the old grandpa of the entire hyperlocal market - there are now half a dozen other sites all trying to crack hyperlocal (including a few of our customers of iBegin Source). I find this remarkable that even with all the hyperlocal sites dying, they keep coming up.

Mind you - I am a big believer of hyperlocal. Our first foray into local was hyperlocal - we built a local community website for the little block we lived in. My fiancee walked around and took pictures of every place in the neighborhood, manually entering it into the database.

The uptake was amazing. Within two months the site was doing roughly 200-300 unique visitors a day. For a two block area, and with minimal promotion, it was quite an eye opener - it let us know that people are (in large quantities) attempting to find and connect locally.

Alas the site no longer remains. It was on one of our older servers, which suffered a catastrophic failure. The entire site was lost.

  • 0 Comments |

Local & Franchises: Why is the data locked?

The most popular specific searches across our local properties are always brand names - Sears, [local grocery chain], McDonalds, etc etc.

What has always boggled my mind is if you visit a franchise/national brand store (eg Sears or McDonalds), they all have a franchise locater. It is an obvious feature that people would like.

But what absolutely boggles my mind is why this data is locked? If I was McDonalds, I would want to make sure everyone knew where you could find McDonalds. I would want to make sure all closed down stores were not listed. I would make this data available for free.

The reality is, no matter how much they may want to crawl back into their cocoon, people use other sites to find their brands. Even niche players get a significant enough chunk of traffic. Sure, McDonalds may talk directly to SuperPages.com. And YellowPages.com. You can argue that it isn’t profitable enough for them to have a direct relationship with everyone - fine. But is it really in their interests to allow other companies to publish bad data? Of course not.

If I could sit down the responsible for their internet operations, I would just have one question to ask: “Why don’t you allow anyone and everyone to download a list of all [insert name here] locations? All these people are doing is *promoting* your business”

There is a lot of talk about walled gardens of data, and how web 2.0 is suppoused to change that. There are some legitimate reasons for walled gardens, but for franchise data? None.

  • 1 Comment |

iBegin Geocoder: Ready for Prime Time

And I am happy to pronounce the launch of our geocoding service - iBegin Geocoder.

We now feature:

Whew :)

  • 0 Comments |

iBegin Geocoder: Soft Launch

The first of my previously mentioned three releases, this is a soft launch of our geocoding service: iBegin Geocoder.

Basically enter any address in the US/Canada, and it can convert it to GPS coordinates. Enter any lat/long in the US/Canada, and it can give you the address it translates to, the nearest intersection, and the nearest major intersection.

Any application that works with user locality in US/Canada needs geocoding.

It should be ready by Wednesday for commercial usage and fully complete.

In the mean time - if you find any bugs, drop a comment.

UPDATE: Just wanted to add - the system is likely slow right now as we are working on some internal mechanics on the server. When we launch it will be butter smooth.

UPDATE 2: And our geocoder has launched.

  • 2 Comments |

Geomas: We own the patent on local search

Wired has the story how Geomas is suing Verizon, claiming that they infringe patent No. 5,930,474, for an “Internet Organizer for Accessing Geographically and Topically Based Information.”

The repercussions (if Geomas wins) could be far-reaching into the local sphere -

The patent describes an internet search functionality in which users can locate a topic or business based on their location. If you’ve ever looked for a nearby doctor or plumber online using your ZIP code or city, according to Geomas, the site you used likely infringed upon the patent. “In a perfect world, we commercialize the technology and grab licensing fees,” said Jason Galanis, founder of Geomas, which was formerly called Yellowone Investments. “We aren’t necessarily looking to sue as our main business, but realistically I think that’s going to have to happen.”

Praized has a few more thoughts.

  • 2 Comments |

iBegin Source has been a learning experience that has opened my eyes a lot - sales cycle, perceived value, etc etc.

We’ve had a lot of experience with doing small sales (< $500) - eg ForumTemplates (in the last 22 minutes the site has had five sales at $17.00 each). Automated processes, quick and detailed instructions, forum for general support (everyone can ‘learn’ together), etc.

iBegin Source has been a different beast. A few of the following points to learn:

  1. Perception. Price in itself is a huge part of the perception. Would a Ferrari be a Ferrari if it was priced at $100,000? Whenever you sell something, you have a markup - the revenue on a sale minus the cost of good + cost of sales. So when people see $40,000/$1000 they think - “this can’t be right” or “they must just be resellers”. We are neither. We’ve invested a lot into this project, but - data sales is not the end all be all. If we don’t make a penny from sales, we are still okay. The reality is that we created this system as a way of powering iBegin city sites across North America. We have only ourselves to blame - our story (which is unique) isn’t published properly. Need to fix that.
  2. People are demanding. For example geocoding - its an imprecise science. No one has 100% coverage (it is impossible - no central place for all geodata. Even the US government’s data is incomplete). Managing expectations is hard - I’m not saying bad data is okay, I’m saying that just like you didn’t expect to get 100% on every test you took, local data is so dirty and murky that the same rules apply here. The only solution? Honest answers. We had a buyer of Florida data point out an error we had. We immediately thanked him, and said we would get back to him. In 24 hours we had identified the problem (incidentally it only affected Florida), explained what had happened, and let him know we had fixed it.
  3. The sales cycle is long. It can be very long. We have big (ie massive) who contacted us on the day we launched, and take weeks to reply. Various department heads ask the same questions. Sometimes the person him/herself asks the same question again. On the individual/start-up level, the same questions exist. A big question - can they trust us to be around in a year? Five years? (more on that in a bulletpoint below). We had one customer contact us within two weeks of us launching. Last week he purchased one state (an 80 day turnaround). He has promised to buy the full data, but I assume that will take another 2-3 months. So almost 6 months to go from initial contact to sales. That isn’t a small cycle (then again quite a few have bought without even contacting us, so there are two sides to every story).
  4. Branding. This is different by perception - perception is more about the quality of the data, whereas branding is more about the quality of the company. Can we be trusted? If we aren’t funded, how are we doing this? Are we scraping results? (to answer those: yes, we’ve grown organically over many years, and no). This again points out ‘our story’ - it is unique, and we should be proud of it (not that we aren’t, we just don’t show it off). The old adage of “no one ever got fired for buying from IBM” bites us square in the butt here.
  5. Flexibility. The beforementioned forum templates sales are easy - it is a design, and you buy in. Some customers asked that the header be customized, but that is all very easy to do. When it comes to business data - requests are all over the ballpark. From a single category in a single city (eg ‘Vegas Restaurants’) to asking to use data on upto a million domains (seriously), people have different needs. In most cases we have stuck to the system. Our entire pricing system focuses on efficiently handling scale and sales - 10 customers or 1000 customers, our internal backend works the same. In some special cases we do bend (eg using the data on multiple domains), but in almost all cases we stick to the system. In the short-term this may not be the best idea, but it lets us focus on the core data and the systems that deliver it. I believe this approach will bear fruit in the future.
  6. Misinformation. iBegin does local - our internal motto is “We do Local”. That confuses people sometimes - why not focus on one thing? (because everything is related - by bringing it all ‘in house’ we can ensure much better quality and reliability throughout the entire site). Competitors haven’t helped - I’ve received some choice quotes from prospective customers where they were given totally incorrect information (the one that grates my nerves the most is that we crawl the internet for our information).
  7. Proof. This one is the hardest - people want to see sites using our data. I’ve been shown some very interesting sites using our data, but they aren’t ready to launch. We launched less than 90 days ago, our sales cycles are long, and building a website with a ton of data isn’t easy. The one benefit we do have is some launched quickly (eg RestuarantReviews.com, and we have our own iBegin City sites to show off).

All in all - very different from our previous sales experience (through our customers and our own stuff, we’ve pushed over $20,000,000 worth of ‘goods’ over the years).

So with all this in mind, we are going to slightly change our approach. Sales cycle, proof, and (to a certain degree) branding & perception - those are things with don’t have control over. We know we’ve been around for a while. We know we are cash-flow positive. We know that this data is mission critical important to us. But we cannot prove that immediately. Over time, people will see that not only are we still around, but we are thriving. Our sales are already up - time will only help.

In the meantime? We focus on iBegin v3 and iBegin Partners - more on that soon.

  • 1 Comment |

My recent posts have included one about Google opening up the directions API, and about Loki and its geo-location systems.

The next flood is open APIs - everything is opening up, and while it is exciting, it is also a bit overwhelming.

Beyond the above two (all great fits for iBegin), we have Garmin releasing an API to interact with its devices, we have Google Mapplets, and Facebook’s shift into a platform. And those are only a few. What about integration with login systems like OpenID and Yahoo? Exporting capabilities so others can create too?

I think we are reaching the point of so many powerful (ie - highly trafficked) sites having open APIs that it is becoming more and more important to have someone fulltime mashing your data with these systems. The above examples I gave are all perfect fits - figure out the closest gas station using Garmin. A mapplet for important categories like cafes or fast-food. A module so Facebook users can not only search but also incorporate their reviews, pictures, and events into the system. Allow Yahoo!/OpenID/Google ID users to login so they don’t have to create yet another account.

And the list goes on and on - whew … keeping up is becoming harder and harder.

There is a lot of talk about walled-garden et all, but I believe with the hyper-activity now going on in building out APIs that anyone can use, it is becoming more important to just by everywhere. Users don’t like being forced one way in another - but they do like it when you support a multitude of systems.

Companies were initially afraid of search engines - but then became best buddies with all the traffic they sent. Same thing happened with social networks - they were very resistant at first, but now you see Digg and Del.icio.us links everywhere. Sure they send traffic to Digg/Delicious/et all, but they get a lot of traffic back. And the same thing is going to happen (especially in the local space) with all these open APIs. Garmin works hard to get its users. Google is always angling new ways to keep users on their site. Facebook works hard to keep users on its site. It makes sense to leverage their platforms to get more traffic to your sites.

Think about this - a user (with a Yahoo account) ends up on your website. They want to add a review - but have to be logged in first. In one situation, you require them to create a new account. In another, they can login using their Yahoo account. The choice should be obvious.

I believe the ‘winners’ will be those who are found everywhere, on all the major platforms.

  • 0 Comments |

Embedded YP Listings

I’m not sure anyone else does this (as far as I know - they don’t), so I’m gonna toot my horn on this: customizable embedded YP listings.

An example (zero branding) of Ra in Scottsdale, AZ:

You can also see customization options here.

  • 0 Comments |

Google Maps API now features Directions

I usually eschew posting news, but to me, this is big.

Most sites I have come across that use Google Maps API send visitors to Google Maps to get directions. No need anymore - Google has released driving directions as part of their API.

This really makes me pause - why would they do this? I’m sure the directions issue was pushing a fair bit of traffic to Google Maps - are Yahoo/MSN/Ask/Mapquest far behind?

  • 0 Comments |