Online video has been growing exponentially over the last few years. And while digital video advertising is a burgeoning marketplace, one area where video severly lags behind is in search. Even as real-time data finds its way onto search engines, the growing cache of online video content exists almost outside the reach of search.

That’s something that startup SpeakerText is hoping to change. The company has created technology that links transcripts to video, making it easier to watch, track and search that content. The company began last year with a budget of $4000 and an engineering team paid with iPhones. Before even launching a product, the company has corraled Meebo founder Seth Sternberg as an advisor and peaked the interest of venture capitalist Fred Wilson.

The company is launching tonight at New York Tech Meetup. I caught up with SpeakerText founder Matt Mireles to talk about his hopes for SpeakerText, how efficient video search will be here within five years and why he’s not worried about competing with YouTube.

How will SpeakerText change the online video viewing experience?
Video is currently linear. If you want to find anything in particular, you don’t actually know what you’re going to find when you get there. SpeakerText is better than just a transcript. We take video and, combining it with the YouTube API, the transcript becomes part of the video itself. Click on the part of the transcript you want and you’ll be directed to that section of video. You can search the video using the text, but what really is useful is this: let’s say you see a quote. You can click on the quote and it copies that quote and hyperlinks it back to the original video. When that’s embedded in a blog, your reader comes along, see the quote, clicks on it and goes back to SpeakerText, which starts the video at say 1:22. You can actually go to the direct source, and readers don’t have to pull their hair out trying to find a quote.

Our little tagline is “read my clip.” You can actually read the whole thing. If you’re watching a video using SpeakerText, that means you can watch the first ten minutes of the video, read the next 40 miuntes and decide to watch the last 20 minutes. Once you have text that you can access and interact with, so much stuff becomes possible.

How important is a tool like this to the future of online video?
I can’t believe that nobody invented this already. I think that within five years, people aren’t even going to remember what it was like before we had this sort of interface of text linked to video that’s searchable, linkable, and quotable. This will be standard. The question for me is will SpeakerText be the one they associate with for that or someone else?

If you search on YouTube today, it’s kind of like searching the web before Google. The results are terrible. But when you search SpeakerText, even in its very basic form today, the result isn’t just video. When you click on the results it takes you to the exact moment within the video. We can do this with any video that uses SpeakerText. If we can really scale and grow, we can start eating into YouTube’s search marketshare.

How are you planning to monetize?
Right now we have to to create a useful product. But we see the bulk of our revenue coming from professional uses. Any time someone copies a quote with SpeakerText, that quote links back to the
original video at the appropriate time where the quote came from. Today all the quotelinks point back to From a product design prospective that is really useful for us. But once we prove the efficiency of that model and work the bugs out, we want to create a professional version that links directly to publishers’ websites. That way, the end user gets to see the real source, and doesn’t have to rely on third party interpretation. But the original site, like say TechCrunch, gets credit for the work they’ve done putting the video out there.

Where does SpeakerText fit into the online video marketplace?
We’re working to be a lightweight layer of time based metadata between the video stream and the end user. We are not trying to be another YouTube or BrightCove, we’re a little layer between the two that makes the video exponentially more useful.
To get that metadata you either need machines or people. To get reliable transcription right now, you need human beings. We tap into Mechanical Turk to power all of the transcription. We’ll soon be rolling out APIs that 3rd party transcription services can use. But we’re a tech company first, not a transcription company. Right now we support YouTube videos, and we’re working to support other hosts.

How do you see yourself in relation to YouTube?
Our customers are video publishers and we kind of want to make life easier for them. My hope is that 8-10 months from now, SpeakerText will be a one stop shop. You can create a video, then SpeakerText it and syndicate it to all these other places. Once you have the transcript, it’s just a matter of tweaking a couple of things to make it work with different platforms. At the end of the day, SpeakerText can serve publishers better than YouTube, because we’re a little bit more focused on the publishers.

That’s the thing a startup has – a narrowness of focus. We’re not an advertising company, so we don’t have the same problems. What’s Google trying to do? Make YouTube profitable. Part of how they’re doing that, is with advertising that kills the user experience. We don’t really have that luxury. We have to make people get mouthwateringly excited about our product.

How will searchable video help publishers and brands?
SpeakerText in particular, with the advent of the quotelink, is going to be really useful for marketers and publishers. Today you do some really hot video and often times people will embed and share that video directly. But a lot of how people learn about your videos is when people quote or talk about them. When a video gets popular, people will share it, but they’ll also quote it. Right now, there’s no effective way to link the quote to the video. SpeakerText makes it easier to get the quote. 

What are the implications of searchable video?
Searchable, readable, browsable video will give publishers an incentive to publish more video because the end user will find the videos they publish more useful and user friendly.

One hard thing right now is serving contextual advertising against video. If you don’t know what the hell is going on inside of a video, it’s really hard to have a video version of AdSense that can analyze the text and show an ad that makes sense against it. Once you have text for video, advertising can be contextual not just to a video, but to a particular monment in a video. It will be a boon for end users and advertisers and publishers.

Why did you decide to launch at New York Tech Meetup?
NYTM is an awesome venue and it just happened to really work well with our timing. Most of the people on our team are also students. My cofounder Björn Liljequist is in architecture school right now and the two engineers we hired, they’re actually graduating seniors — at Cooper Union and University of Rochester. They’re on winter break, which makes it easy to do 18-20 hour days of data base migration and debugging and stuff like that. So the timing worked out, and we just needed to put a stake in the ground and decide that whatever the hell we have ready by then, we will launch. At NYTM if you fail you fail, but if you do something cool, people will realy like it and you can get some real notoriety from it.