Interview with Otto Söderlund, CEO & Co-Founder of Speechly
Are you satisfied with modern voice technology as we move into 2021? Otto Söderlund, the CEO and co-founder of [Speechly](https://www.speechly.com/?utm_campaign=blogpost&utm_content=speechly-podcast&utm_medium=blog&utm_source=website&utm_term=listen-episode), still sees a lot of shortcomings in the technology. That’s why his startup has developed technology that performs speech recognition and natural language understanding in real-time. In other words, Speechly is doing things with voice technology that the big players in that space have yet to do.
Otto was a guest on the Silicon Valley Momentum Podcast this week with Roland Siebelink. In addition to talking about Speechly’s incredible technology, Otto shared his vision for the future of voice technology and who stands to benefit the most from Speechly’s core technology:
Roland Siebelink: Hello, and welcome to the Silicon Valley Momentum Podcast. My name is Roland Siebelink and I'm a coach and scaleup ally for tech founders. Today, with us, is Otto of Speechly. Tell me a little bit more about Speechly.
Otto Söderlund: Thanks, Roland. Great to be on the show today. Speechly, we are a European startup, focusing on the voice space. Our fundamental belief is that voice is the next paradigm shift in user interaction, a bit like the touchscreen was the last big paradigm shift that at least I remember. I didn't live during the mouse and keyboard evolution shifts. But I think that the latest, big dramatic shift is this one. We've seen a lot of buzz around voice technology in the last year. I think everybody's seen lots of these charts with voice technology going up to the roof, smart speaker, adoptions, assistant usage, and all of that. But when I'm using my regular voice technology, it still sucks. It doesn't really work. And I think that people can relate to that. People use their worst technologies for quite mundane tasks, controlling their home or something like that. But we really haven't seen yet a killer application for voice.
Roland Siebelink: Okay. Very interesting. Tell me a little bit more, before we go into the specific products that you offer, how did you land on this space? What's the history behind Speechly?
Otto Söderlund: Great that you asked. Speechly was founded a couple of years ago, actually four to be precise, when my co-founder, Hannes, returned from his work developing the natural language understanding for Siri. In that work, he came to realize the strengths and also the shortcomings of the current generation of voice technologies. And he had a big dream. We are actually university buddies and go back a long time and he brought together some of his all-time dream team buddies to think about this problem. And that's when we started to really explore this problem in more depth. And then we wanted to create better voice UIs. And we initially started doing that with the existing technologies out there in the market. But quite soon we realized that they just weren't satisfactory to build what we had envisioned. That's why we embarked in developing our own technology. And I think that the biggest insight that led us to build Speechly, and especially investing in building our own technology, was the fact that we realized that accuracy was no longer the issue in voice technology. It's actually the feedback mechanism. If you use any assistant or smart speaker, you probably realize that how they work is that when the user speaks, the assistant or smart speaker waits for the user to finish, and when the user finishes, then it starts to process it. That means that it has actually transcribed that into text and is then transcribing the text via natural language understanding, extracting the meaning, and then applying some basic logic, figuring out what to do in the specific application. And typically after that one, then generating a response using the synthetic voice. And this is actually for technical reasons of how these technologies have been built. And this leads into this turn-based interaction paradigm where the user is limited to these questions, wait, getting an answer, wait, asking again. It's limiting the use cases and it works really well for tasks like playing music on your Spotify or controlling your home automation. But if you want to do something more complicated, something that would create a bit more value than saving the five seconds, we really haven't seen that yet. And that's the problem we're solving. We really want to enable people interfacing with technology, using their voice combined with other modalities and then being able to solve more complex tasks using speech as the main control mechanism.
Roland Siebelink: Let's build a little bit on those visions for more complex applications, as you said, maybe with other modalities. What are you envisioning? Can you give us a few examples of the use cases that there's a need for in your minds but that would currently not yet be supported with the standard off-the-shelf technologies?
Otto Söderlund: Yes. Sure. One example is all kinds of end-user applications where the user wants to keep a lot of information and input to the system in an efficient way. We could take some examples - let's take the example if you are shopping and you want to find some products. Then you could be looking for some brands, maybe if you're talking about fashion, some sizes, some colors, whatever. There's a lot of information that needs to be passed. And this usually involves a lot of clicking and browsing and lots of interaction with many UI elements. I would imagine that finding a product would take many minutes and many clicks to actually find what you're looking for. That's an example of an information-heavy task that you could replace by just saying what you're looking for. I'm looking for black sneakers from Adidas size 13, sort them by price. Or if you're shopping for groceries, and that task is also quite repetitive. Our solution consists of two parts. First being our own core technology. We've developed our own core technology that does both speech recognition and natural language understanding. But what's different with that against all the other technologies out there is the fact that it actually does all of that in real time. The second the user starts speaking, our system immediately starts not only transcribing it but also making sense of it and extracting the natural language understanding intents and entities out of that system. And that is the secret magical ingredient that actually allows us to enable voice user interfaces that react in real time. We want to enable voice interfaces that mimic human face-to-face communication where they actually react in real time to what the user is saying. The moment I've given the system sufficient information for it to be quite certain about my intents or entities, it can immediately react to that. Let's say that I'm talking about booking a sushi restaurant in New York. The system can, while I'm speaking already, visually confirming to me that it actually understood what I'm saying. And if it would make a mistake or if I would change my mind, I could very easily then just correct it. I think that's the first part, our own core technology is quite unique in the market. And then the other part is a platform that allows product and development teams to very easily plug into our core technology and integrate that as part of their own products and services. If you are a product team and you're looking for new ways of how can I improve the user experience of my product or my service, and if the task involves the users requiring to give the system quite a lot of input, then we have the tools for these teams to find out clever ways for the end users to actually use these products in a more efficient and more satisfiable way. And then we have tools for the designers then to design these kinds of user experiences. And we have tools for the developers in actually integrating very easily into our platforms, including SDKs for quite a lot of different development frameworks.
Roland Siebelink: Does it mean that you're competing primarily against, let's say these platforms by Google and other big players in your product setup and in your markets?
Otto Söderlund: Yes, I would say so. If you look at what teams who have tried to build something that they can build with our tools have used before, it is typically the tools from the big players. They've tried to combine those tools, taking speech recognition from here, taking natural language understanding from here, taking some visual components from here, and then bringing them together to create something similar that you could very easily create with our solution. That's what we usually do consider our competitors.
Roland Siebelink: Okay. It also sounds like whereas your customers in the past may have cobbled together their own solution from two, three different providers, that there is an integration component that makes their life a lot easier. Would that be correct?
Otto Söderlund: Yes. The integration component is definitely one and making it easy to integrate. And the second part being the fact that no other provider has the ability to in real-time extract meaning out of it. And that's something which sounds like a small technical feature, but the big revelation - and it's something that ourselves also took us with a surprise - the fact that actually that feature allows you to rethink how you design the user interfaces. Because it actually allows you to provide the users with the real-time possibility to get confirmation of their speech.
Roland Siebelink: Let's move a little bit, Otto, to the go-to-market vision behind Speechly. This is your product as you've just described, so who's your target group? How are you reaching them? And how far have you got them so far?
Otto Söderlund: The target group for our product is really all product and development teams that are building amazing products that have end user use cases that involve end users giving some more complicated or more repetitive information to the system. That's our target group. And how we address these target groups is really we are a very tech-oriented company. Our team consists of people who have been creating these previous generations of technologies like many of these assistants that shall not be named here. And we have a very, very practical and fact and tech oriented approach to bringing the technology to the market. We really want to create the best tools for these teams to operate. Instead of investing huge amounts of money in marketing and making a huge noise about ourselves, it's really about being able to provide the best tools for giving these teams an unfair advantage. And unfair advantage means from being able to find a new technology that can make their product radically better. And it's like the traditional marketing slash product-driven-growth approach where it's more of that rather than paid advertising or branding or anything like that. It's also because our target group is in a way quite specific. That really enables us. And we are, of course, all the time, on the lookout for product teams that are building something unique. That's, of course, something that we are trying to build the tools and making them easy for the product teams to find our tools. And, of course, we are also trying to identify the coolest products out there that could really benefit from our technology.
Roland Siebelink: Otto, can you talk a little bit about some early customers or how did you get your first customers in the first place? People are often looking to get over that hump of acquiring your first customers, especially in this kind of B2B context, right? How'd you guys achieve that?
Otto Söderlund: I would say that we've gone through two phases of our evolution. I would even call it a pivot in between. In the early stage, we were exploring these voice - what kind of end-user problems can better be solved with the voice-enabled solution? And in this time we were working directly with enterprises and companies. And we come from Finland and we were working with many of the local industry leaders here, solving their problems. And we were able to build really cool voice experiences for retailers, for banks, for media companies, for education companies, for healthcare providers. And that was the first phase of our growth. And it was really about being able to find use cases and work with the best, most-innovative companies in the region.
Roland Siebelink: Almost acting like a development shop for them?
Otto Söderlund: Yeah. That was pretty much customizing and really exploring the problem because that's usually when you're trying to find the product-market fit. How can you do that if you don't have access to the end-users - if you're trying to do it from their office? That was the first phase of our evolution.
Roland Siebelink: If I may just ask about that first phase, Otto, was there already a product you had in the backend, even if customers may not have realized it or where you're just doing project work to basically explore the problem before you even started productizing it?
Otto Söderlund: In the early, I think that the first year, or even a year and a half, it was mostly using existing solutions out there and trying to build what we were envisioning with the existing products and trying to find hacks. That was sometimes quite frustrating also in getting them to work and that actually validated to us the problem with the existing solutions out there. That really gave us the confidence to then start building initially the MVP of our own tech stack. And then we built the MVP of our own tech stack and we're using that to solve our customer's problems. And then we started to see that, "Hey!" This actually works - because originally it was, of course, a hypothesis. Our original hypothesis was that if we can decrease the delay in the feedback of the voice you are to the user, we can dramatically improve the user experience. And, when we had the first MVP and we were starting to see that "Damn, this actually works really well." That's when we raised some VC funding and then we - I wouldn't even call it in a way pivot. Pivoted from this vertical, very client-centric approach, exploring the problem or customization - I'm not sure if it's a pivot is just evolution in our normal startup.
Roland Siebelink: Oh, who can say, right? Very often the evolution to one person looks like a pivot to the other. Absolutely.
Otto Söderlund: Exactly, exactly. And then we now spend a couple of years then packaging, of course, then building and finalizing the core technology. Of course, that has taken quite a lot of efforts and resources because it's technology that isn't in the market and it's very, very complicated to do that. That's one of the reasons why none of the other players have introduced our technology to the market. And we then started to package it for these development teams to be able to use themselves. And that's really when we started the product phase of our journey, really building the product. And now we are in closed beta, so we opened around the summer this year for development teams to do self-service. We still have clients that we've worked directly with in production on our platform. But we have now pivoted towards this development-team-centric approach rather than onboarding final production customers ourselves and we have now development teams that are working independently with our technology.
Roland Siebelink: Okay, last question before we close. You mentioned your co-founder Hannes, who worked on one of those famous voice assistants, as you said, right? How have you guys been building the company and how do you divide the work? How do you make sure you have the best possible relationship with your co-founder is really my question here.
Otto Söderlund: That's an excellent, excellent question because one of the things that I was thinking of sharing was also the importance of this co-founder relationship. We actually go back a pretty long time ago. We started our study around 20 years ago. Really, the key to the co-founder relationships is about mutual trust. Nobody is perfect. We all have our strengths. We all have our weaknesses. And in our case, we had both already seen quite a lot of the world. We have been building big international business. We had been working with big, major and global brands. For us, it was really about two senior people coming together. It's about valuing each other's experiences but also being able to expose each other's vulnerabilities and learning points. And that's something that I appreciate very much in my co-founder is the fact that we are each other's mentors. And we share our strengths and our weaknesses, and we constantly give each other feedback in a very direct, constructive, and amicable or friendly way. And that's something that I really much respect. Then we talk about difficult things. That's something. We talk about our families. We talk about how we sleep, how we exercise. We talk about everything. In my previous startup, the co-founding team, we called ourselves the family in a way that the company was the baby. We are now with Hannes, we are having this family.
Roland Siebelink: Like the parents, right? Exactly. And it sounds like you have a better relationship than many people would have in their real relationship with their real partners. The amount of openness you show to each other seems quite amazing and I think absolutely best practice, right? To build up that trust with each other, and act as each other's accountability buddy in a sense, right?
Otto Söderlund: Absolutely. Absolutely.
Roland Siebelink: Very briefly, Otto, your vision for Speechly you talked a lot about the product, but where do you want the company to be 10 years down the road?
Otto Söderlund: We envision being the technology of choice for teams that are building amazing products for actually adding voice and other intuitive user interface elements inside their products in the future. We're thinking of us as the Twilio for voice UI. In the future, the business of people will be happy using those functionalities in all kinds of applications, services, hardware, and they're really happy and satisfied with that one. And they have no clue it's all powered by Speechly. But the product teams have.
Roland Siebelink: Very good. Yes. I like it. Be very focused about your product target group. If you're a B2B brand, you don't have to have a consumer brand, right? Otto Söderlund, CEO and founder of Speechly, if people are interested in finding out more about Speechly, what are you looking for? What could they help you with and where should they go to figure out more about you guys?
Otto Söderlund: Cool. We are all the time looking for really forward-leaning people in product teams and development teams that really want to try new solutions and want to dramatically improve their products and services. If you have a product where the users are actually required to input quite a lot of information or they need to go through repetitive tasks and you are open and willing to try new things, please be in touch with me. I'm happy to find a good way to help you. And for finding more information about Speechly, you can go to our website: speechly.com. You can also go to our GitHub. There's a lot of good code examples there. And you can also find some good documentations on our website for developers, for designers, for product teams. That's probably the main points of information. Of course, you can follow us on Twitter and LinkedIn, and there's a lot of information about the voice industry overall.
Roland Siebelink: Very good. With the sneak preview of the rounds that may be coming down the pike, I'm sure that people who would like to work with Speechly might also see some opportunities come up in the near future. What kinds of profiles are you typically looking for, Otto?
Otto Söderlund: We are looking to strengthen our team in the speech recognition and natural language understanding parts. That's one. Then deep learning, that is also quite important. Then software development. I will say that those are the main technical roles and maybe good to mention that our team philosophy is to build a very flat team of experts. Really senior people wanting to change the world, working together as a team of equals tackling very hard problems rather than a huge amount of people. We are very obsessed with people's quality and we are trying not to build our team too fast or too big. We're trying to keep it as small as possible. But of course, it's not easy when you're growing fast, so you have to be able to scale it. But we are extremely selective with how we grow the team.
Roland Siebelink: Also, absolutely best practice, I would say in keeping the team a little bit smaller than feels comfortable. But then making sure everyone on the team has a huge impact on the end result. Absolutely.
Otto Söderlund: Exactly, exactly. And really combining people from different fields. We're also looking for product people and service designers to join the team. That's also very important. People who want to change how people and machines interact.
Roland Siebelink: And are you hiring primarily locally around the Helsinki area or are you also hiring remote workers these days?
Otto Söderlund: Primarily, we would want to build an on-site team, leveraging all the possible physical support, equipment to work efficiently, solving very complex problems. But we are, of course, quite realistic about supply and demand of amazing people. We are also open for really great people to join us on a remote working basis. And of course, during the COVID times, our team is also working - we are fortunate enough to be able to be at the office here in Finland. The situation is good enough for that. But we are, of course, working partially remotely and did work fully remotely too.
Roland Siebelink: Okay. Very good. Well, thank you so much for joining the Silicon Valley Momentum Podcast this week, Otto Söderlund, the founder and CEO of Speechly. This was a great interview and thank you once again.
Otto Söderlund: Thank you, Roland. It was great!
Roland Siebelink: Thank you everyone for listening and see you back next week.Roland Siebelink talks all things tech startup and bring you interviews with tech cofounders across the world.