Are you satisfied with modern voice technology as we move into 2021? Otto
Söderlund, the CEO and co-founder of
still sees a lot of shortcomings in the technology. That’s why his startup has
developed technology that performs speech recognition and natural language
understanding in real-time. In other words, Speechly is doing things with voice
technology that the big players in that space have yet to do.
Otto was a guest on the Silicon Valley Momentum Podcast this week with Roland
Siebelink. In addition to talking about Speechly’s incredible technology, Otto
shared his vision for the future of voice technology and who stands to benefit
the most from Speechly’s core technology:
- How product and development teams can integrate Speechly into their products.
- Why being tech-oriented has allowed Speechly to reduce its focus on marketing.
- How serving as a development shop was key in Speechly ultimately finding product-market fit.
- Why Otto believes that mutual trust is the biggest key to a successful co-founder relationship.
- Otto’s belief that co-founders are a family and their company is their “baby.”
Roland Siebelink: Hello, and welcome to the Silicon Valley Momentum Podcast. My name is Roland
Siebelink and I'm a coach and scaleup ally for tech founders. Today, with us, is Otto of Speechly. Tell me a
little bit more about Speechly.
Otto Söderlund: Thanks, Roland. Great to be on the show today.
Speechly, we are a European startup, focusing on the voice space. Our fundamental belief is that voice is
the next paradigm shift in user interaction, a bit like the touchscreen was the last big paradigm shift that
at least I remember. I didn't live during the mouse and keyboard evolution shifts. But I think that the
latest, big dramatic shift is this one.
We've seen a lot of buzz around voice technology in the last year. I think everybody's seen lots of these
charts with voice technology going up to the roof, smart speaker, adoptions, assistant usage, and all of
that. But when I'm using my regular voice technology, it still sucks. It doesn't really work. And I think
that people can relate to that. People use their worst technologies for quite mundane tasks, controlling
their home or something like that. But we really haven't seen yet a killer application for voice.
Roland Siebelink: Okay. Very interesting. Tell me a little bit more, before we go into the specific
products that you offer, how did you land on this space? What's the history behind Speechly?
Otto Söderlund: Great that you asked. Speechly was founded a couple of years ago, actually four to
be precise, when my co-founder, Hannes, returned from his work developing the natural language understanding
for Siri. In that work, he came to realize the strengths and also the shortcomings of the current generation
of voice technologies. And he had a big dream. We are actually university buddies and go back a long time
and he brought together some of his all-time dream team buddies to think about this problem. And that's when
we started to really explore this problem in more depth.
And then we wanted to create better voice UIs. And we initially started doing that with the existing
technologies out there in the market. But quite soon we realized that they just weren't satisfactory to
build what we had envisioned. That's why we embarked in developing our own technology. And I think that the
biggest insight that led us to build Speechly, and especially investing in building our own technology, was
the fact that we realized that accuracy was no longer the issue in voice technology. It's actually the
If you use any assistant or smart speaker, you probably realize that how they work is that when the user
speaks, the assistant or smart speaker waits for the user to finish, and when the user finishes, then it
starts to process it. That means that it has actually transcribed that into text and is then transcribing
the text via natural language understanding, extracting the meaning, and then applying some basic logic,
figuring out what to do in the specific application. And typically after that one, then generating a
response using the synthetic voice.
And this is actually for technical reasons of how these technologies have been built. And this leads into
this turn-based interaction paradigm where the user is limited to these questions, wait, getting an answer,
wait, asking again. It's limiting the use cases and it works really well for tasks like playing music on
your Spotify or controlling your home automation.
But if you want to do something more complicated, something that would create a bit more value than saving
the five seconds, we really haven't seen that yet. And that's the problem we're solving. We really want to
enable people interfacing with technology, using their voice combined with other modalities and then being
able to solve more complex tasks using speech as the main control mechanism.
Roland Siebelink: Let's build a little bit on those visions for more complex applications, as you
said, maybe with other modalities. What are you envisioning? Can you give us a few examples of the use cases
that there's a need for in your minds but that would currently not yet be supported with the standard
Otto Söderlund: Yes. Sure. One example is all kinds of end-user applications where the user wants to
keep a lot of information and input to the system in an efficient way. We could take some examples - let's
take the example if you are shopping and you want to find some products. Then you could be looking for some
brands, maybe if you're talking about fashion, some sizes, some colors, whatever. There's a lot of
information that needs to be passed. And this usually involves a lot of clicking and browsing and lots of
interaction with many UI elements. I would imagine that finding a product would take many minutes and many
clicks to actually find what you're looking for.
That's an example of an information-heavy task that you could replace by just saying what you're looking
for. I'm looking for black sneakers from Adidas size 13, sort them by price. Or if you're shopping for
groceries, and that task is also quite repetitive. Our solution consists of two parts. First being our own
core technology. We've developed our own core technology that does both speech recognition and natural
language understanding. But what's different with that against all the other technologies out there is the
fact that it actually does all of that in real time. The second the user starts speaking, our system
immediately starts not only transcribing it but also making sense of it and extracting the natural language
understanding intents and entities out of that system.
And that is the secret magical ingredient that actually allows us to enable voice user interfaces that react
in real time. We want to enable voice interfaces that mimic human face-to-face communication where they
actually react in real time to what the user is saying. The moment I've given the system sufficient
information for it to be quite certain about my intents or entities, it can immediately react to that.
Let's say that I'm talking about booking a sushi restaurant in New York. The system can, while I'm speaking
already, visually confirming to me that it actually understood what I'm saying. And if it would make a
mistake or if I would change my mind, I could very easily then just correct it. I think that's the first
part, our own core technology is quite unique in the market.
And then the other part is a platform that allows product and development teams to very easily plug into our
core technology and integrate that as part of their own products and services. If you are a product team and
you're looking for new ways of how can I improve the user experience of my product or my service, and if the
task involves the users requiring to give the system quite a lot of input, then we have the tools for these
teams to find out clever ways for the end users to actually use these products in a more efficient and more
satisfiable way. And then we have tools for the designers then to design these kinds of user experiences.
And we have tools for the developers in actually integrating very easily into our platforms, including SDKs
for quite a lot of different development frameworks.
Roland Siebelink: Does it mean that you're competing primarily against, let's say these platforms by
Google and other big players in your product setup and in your markets?
Otto Söderlund: Yes, I would say so. If you look at what teams who have tried to build something
that they can build with our tools have used before, it is typically the tools from the big players. They've
tried to combine those tools, taking speech recognition from here, taking natural language understanding
from here, taking some visual components from here, and then bringing them together to create something
similar that you could very easily create with our solution. That's what we usually do consider our
Roland Siebelink: Okay. It also sounds like whereas your customers in the past may have cobbled
together their own solution from two, three different providers, that there is an integration component that
makes their life a lot easier. Would that be correct?
Otto Söderlund: Yes. The integration component is definitely one and making it easy to integrate.
And the second part being the fact that no other provider has the ability to in real-time extract meaning
out of it. And that's something which sounds like a small technical feature, but the big revelation - and
it's something that ourselves also took us with a surprise - the fact that actually that feature allows you
to rethink how you design the user interfaces. Because it actually allows you to provide the users with the
real-time possibility to get confirmation of their speech.
Roland Siebelink: Let's move a little bit, Otto, to the go-to-market vision behind Speechly. This is
your product as you've just described, so who's your target group? How are you reaching them? And how far
have you got them so far?
Otto Söderlund: The target group for our product is really all product and development teams that
are building amazing products that have end user use cases that involve end users giving some more
complicated or more repetitive information to the system. That's our target group. And how we address these
target groups is really we are a very tech-oriented company. Our team consists of people who have been
creating these previous generations of technologies like many of these assistants that shall not be named
And we have a very, very practical and fact and tech oriented approach to bringing the technology to the
market. We really want to create the best tools for these teams to operate. Instead of investing huge
amounts of money in marketing and making a huge noise about ourselves, it's really about being able to
provide the best tools for giving these teams an unfair advantage. And unfair advantage means from being
able to find a new technology that can make their product radically better.
And it's like the traditional marketing slash product-driven-growth approach where it's more of that rather
than paid advertising or branding or anything like that. It's also because our target group is in a way
quite specific. That really enables us. And we are, of course, all the time, on the lookout for product
teams that are building something unique. That's, of course, something that we are trying to build the tools
and making them easy for the product teams to find our tools. And, of course, we are also trying to identify
the coolest products out there that could really benefit from our technology.
Roland Siebelink: Otto, can you talk a little bit about some early customers or how did you get your
first customers in the first place? People are often looking to get over that hump of acquiring your first
customers, especially in this kind of B2B context, right? How'd you guys achieve that?
Otto Söderlund: I would say that we've gone through two phases of our evolution. I would even call
it a pivot in between. In the early stage, we were exploring these voice - what kind of end-user problems
can better be solved with the voice-enabled solution? And in this time we were working directly with
enterprises and companies. And we come from Finland and we were working with many of the local industry
leaders here, solving their problems. And we were able to build really cool voice experiences for retailers,
for banks, for media companies, for education companies, for healthcare providers. And that was the first
phase of our growth. And it was really about being able to find use cases and work with the best,
most-innovative companies in the region.
Roland Siebelink: Almost acting like a development shop for them?
Otto Söderlund: Yeah. That was pretty much customizing and really exploring the problem because
that's usually when you're trying to find the product-market fit. How can you do that if you don't have
access to the end-users - if you're trying to do it from their office? That was the first phase of our
Roland Siebelink: If I may just ask about that first phase, Otto, was there already a product you
had in the backend, even if customers may not have realized it or where you're just doing project work to
basically explore the problem before you even started productizing it?
Otto Söderlund: In the early, I think that the first year, or even a year and a half, it was mostly
using existing solutions out there and trying to build what we were envisioning with the existing products
and trying to find hacks. That was sometimes quite frustrating also in getting them to work and that
actually validated to us the problem with the existing solutions out there. That really gave us the
confidence to then start building initially the MVP of our own tech stack.
And then we built the MVP of our own tech stack and we're using that to solve our customer's problems. And
then we started to see that, "Hey!" This actually works - because originally it was, of course, a
hypothesis. Our original hypothesis was that if we can decrease the delay in the feedback of the voice you
are to the user, we can dramatically improve the user experience. And, when we had the first MVP and we were
starting to see that "Damn, this actually works really well." That's when we raised some VC funding and then
we - I wouldn't even call it in a way pivot. Pivoted from this vertical, very client-centric approach,
exploring the problem or customization - I'm not sure if it's a pivot is just evolution in our normal
Roland Siebelink: Oh, who can say, right? Very often the evolution to one person looks like a pivot
to the other. Absolutely.
Otto Söderlund: Exactly, exactly. And then we now spend a couple of years then packaging, of course,
then building and finalizing the core technology. Of course, that has taken quite a lot of efforts and
resources because it's technology that isn't in the market and it's very, very complicated to do that.
That's one of the reasons why none of the other players have introduced our technology to the market. And we
then started to package it for these development teams to be able to use themselves. And that's really when
we started the product phase of our journey, really building the product. And now we are in closed beta, so
we opened around the summer this year for development teams to do self-service. We still have clients that
we've worked directly with in production on our platform. But we have now pivoted towards this
development-team-centric approach rather than onboarding final production customers ourselves and we have
now development teams that are working independently with our technology.
Roland Siebelink: Okay, last question before we close. You mentioned your co-founder Hannes, who
worked on one of those famous voice assistants, as you said, right? How have you guys been building the
company and how do you divide the work? How do you make sure you have the best possible relationship with
your co-founder is really my question here.
Otto Söderlund: That's an excellent, excellent question because one of the things that I was
thinking of sharing was also the importance of this co-founder relationship. We actually go back a pretty
long time ago. We started our study around 20 years ago. Really, the key to the co-founder relationships is
about mutual trust. Nobody is perfect. We all have our strengths. We all have our weaknesses. And in our
case, we had both already seen quite a lot of the world. We have been building big international business.
We had been working with big, major and global brands. For us, it was really about two senior people coming
It's about valuing each other's experiences but also being able to expose each other's vulnerabilities and
learning points. And that's something that I appreciate very much in my co-founder is the fact that we are
each other's mentors. And we share our strengths and our weaknesses, and we constantly give each other
feedback in a very direct, constructive, and amicable or friendly way. And that's something that I really
Then we talk about difficult things. That's something. We talk about our families. We talk about how we
sleep, how we exercise. We talk about everything. In my previous startup, the co-founding team, we called
ourselves the family in a way that the company was the baby. We are now with Hannes, we are having this
Roland Siebelink: Like the parents, right? Exactly. And it sounds like you have a better
relationship than many people would have in their real relationship with their real partners. The amount of
openness you show to each other seems quite amazing and I think absolutely best practice, right? To build up
that trust with each other, and act as each other's accountability buddy in a sense, right?
Otto Söderlund: Absolutely. Absolutely.
Roland Siebelink: Very briefly, Otto, your vision for Speechly you talked a lot about the product,
but where do you want the company to be 10 years down the road?
Otto Söderlund: We envision being the technology of choice for teams that are building amazing
products for actually adding voice and other intuitive user interface elements inside their products in the
future. We're thinking of us as the Twilio for voice UI. In the future, the business of people will be happy
using those functionalities in all kinds of applications, services, hardware, and they're really happy and
satisfied with that one. And they have no clue it's all powered by Speechly. But the product teams have.
Roland Siebelink: Very good. Yes. I like it. Be very focused about your product target group. If
you're a B2B brand, you don't have to have a consumer brand, right?
Otto Söderlund, CEO and founder of Speechly, if people are interested in finding out more about Speechly,
what are you looking for? What could they help you with and where should they go to figure out more about
Otto Söderlund: Cool. We are all the time looking for really forward-leaning people in product teams
and development teams that really want to try new solutions and want to dramatically improve their products
and services. If you have a product where the users are actually required to input quite a lot of
information or they need to go through repetitive tasks and you are open and willing to try new things,
please be in touch with me. I'm happy to find a good way to help you.
And for finding more information about Speechly, you can go to our website: speechly.com. You can also go to
our GitHub. There's a lot of good code examples there. And you can also find some good documentations on our
website for developers, for designers, for product teams. That's probably the main points of information. Of
course, you can follow us on Twitter and LinkedIn, and there's a lot of information about the voice industry
Roland Siebelink: Very good. With the sneak preview of the rounds that may be coming down the pike,
I'm sure that people who would like to work with Speechly might also see some opportunities come up in the
near future. What kinds of profiles are you typically looking for, Otto?
Otto Söderlund: We are looking to strengthen our team in the speech recognition and natural language
understanding parts. That's one. Then deep learning, that is also quite important. Then software
development. I will say that those are the main technical roles and maybe good to mention that our team
philosophy is to build a very flat team of experts. Really senior people wanting to change the world,
working together as a team of equals tackling very hard problems rather than a huge amount of people.
We are very obsessed with people's quality and we are trying not to build our team too fast or too big.
We're trying to keep it as small as possible. But of course, it's not easy when you're growing fast, so you
have to be able to scale it. But we are extremely selective with how we grow the team.
Roland Siebelink: Also, absolutely best practice, I would say in keeping the team a little bit
smaller than feels comfortable. But then making sure everyone on the team has a huge impact on the end
Otto Söderlund: Exactly, exactly. And really combining people from different fields. We're also
looking for product people and service designers to join the team. That's also very important. People who
want to change how people and machines interact.
Roland Siebelink: And are you hiring primarily locally around the Helsinki area or are you also
hiring remote workers these days?
Otto Söderlund: Primarily, we would want to build an on-site team, leveraging all the possible
physical support, equipment to work efficiently, solving very complex problems. But we are, of course, quite
realistic about supply and demand of amazing people. We are also open for really great people to join us on
a remote working basis. And of course, during the COVID times, our team is also working - we are fortunate
enough to be able to be at the office here in Finland. The situation is good enough for that. But we are, of
course, working partially remotely and did work fully remotely too.
Roland Siebelink: Okay. Very good. Well, thank you so much for joining the Silicon Valley Momentum
Podcast this week, Otto Söderlund, the founder and CEO of Speechly. This was a great interview and thank you
Otto Söderlund: Thank you, Roland. It was great!
Roland Siebelink: Thank you everyone for listening and see you back next week.
Roland Siebelink talks all things tech startup and bring you interviews with tech cofounders
across the world.