An Ethical Approach to Synthetic Voice: Why we invested in Resemble AI

Javelin VP
7 min readJul 13, 2023

--

Predicted to grow GDP by at least $7 trillion, Generative AI has, and will continue to, usher in a new wave of invention and creativity. We at Javelin believe that a golden age of start-up innovation is upon us: small, creative teams, armored with the tools of Generative AI, will be able to compete with large incumbents more than ever before, and in a capital efficient way.

The challenging part of investing in Generative AI is deciphering which emerging products are truly differentiated and defensible versus short-term hype, with many companies built via easy integrations with horizontal LLM platforms like OpenAI or Bard. Early-stage investors need to have a keen lens and a framework to invest in AI at this moment. At Javelin, our framework is three-pronged:

  1. Unpack drivers of growth to decode what’s real vs. hype-generated: Javelin has over a decade of experience investing in the AI/ML space. Exposure and experience investing in platforms like Viable, Skytree (The Machine Learning Company), Sense Networks, Prismatic, Fem Inc., and others provide us with a unique lens when evaluating new investment opportunities and dissecting future sustainable growth.
  2. Understand scope of future potential market demand: our existing portfolio spans both iconic consumer companies (MasterClass, Thumbtack, Pair Eyewear, Carbon Health, Niantic Labs, Mythical Games) and must-have B2B software (RxVantage, Stensul, Higharc, Sequel), all of whom are voracious customers of new AI technology ranging from personalization to core product feature extensions — a proprietary channel to vet new AI vendors quite extensively.
  3. Identify if standards of ethics in use of AI is a top, if not the most important, priority for the Company: Javelin’s investment philosophy is to back technology platforms that work for people instead of against them. The burgeoning AI space is especially ripe for potential ethical abuses. Legal frameworks around original IP management and artist monetization within the context of Generative AI are already top of mind for regulators and we will see more examples of individuals like Sarah Silverman suing players such as Meta and OpenAI for copyright infringement. We see this as a massive opportunity to invest in AI founders building products with an ethics-first approach, whereby built-in copyright processes in workflow becomes a massive competitive advantage given the correct ethical approach is built into the technology from the get-go.

This set of criteria led us to take a very close look at the synthetic voice space. Synthetic voice as a technology has been around for some time, starting in use cases for the entertainment industry and movies, so we knew there would be a deep well of know-how and technological depth to dig into. Our portfolio company, MasterClass, has long been evaluating vendors in this space, which gave us a customer’s perspective of critical needs and requirements. And lastly, the space is ripe with well publicized ethical abuses and worry from enterprises and the broader public, which we view as an opportunity to build a product and a company that, head on, addresses these issues.

Enter Zohaib Ahmed and Saqib Muhammad, co-founders at Resemble AI, a generative voice AI platform creating high-quality, realistic-sounding synthetic voices (either from scratch or emulated), with appropriate guardrails on copyright and misuse.

Co-Founders, Zohaib Ahmed (CEO, Right) and Saqib Muhammad (COO, Left) with their Webby Award

We first came upon Resemble in 2019, when our Principal, Tasnia Huque, was at Warner Brothers. Interest in how large media organizations think about IP, including that of “voice”, landed Resemble to her desk. Back in 2019, generative AI was not in the vernacular of our general lexicon, like it is today. But Resemble was building high quality voice generative AI and the potential of the technology was palpable.

At that time, Resemble AI was in early days with scalable features and ethics at core of the experience still being built. Fast forward to 2023, and Resemble has become a category leader with the most scalable and ethically minded product on the market, with many large enterprise customers and over 1 million users. We are thrilled to be partnering with Zohaib and Saqib, and to be leading Resemble’s Series A.

While media and entertainment has been a key vertical for Resemble since its inception and a large market by itself, what really excited us is the need for scalable voice technology across many end markets and functions. The needs permeate not only across several use cases in entertainment (production, neural audio editing, language dubbing) but also across gaming, education, advertising, government, automotive, customer success, sales conversations, and more. Across both individual and brand voices, Resemble has been used to create hundreds of thousands of AI voices.

Enterprises are choosing Resemble because they offer a comprehensive suite of features, including IP and copyright protection, that serve all their generative audio requirements, eliminating the need for expensive recording equipment, custom work-flow, or voice professionals. Key features in Resemble’s platform include:

  • AI Watermarker: detects whether your audio data has been used to train Generative AI models
  • AI Detection: technology that identifies if voice content was generated by AI
  • IP Protection: IP and copyright permission provisioning workflow
  • Text-to-Speech: large language enabled text-to-speech for natural sounding voices out of the box
  • Speech-to-Speech: one voice to another with control over the style and performance
  • Neural Audio Editing: edit existing audio with Resemble Fill
  • Language Dubbing: transform AI voices into 60+ languages
  • Fictitious Voices: type in kind of voice and generate dozens of fictitious voices
  • Prompt Engineering: “type-in-emotion”, for example “make it dramatic” makes the voice dramatic
  • Zero Shot Voice Cloning: generate high quality AI voice with 3 samples only
  • Real-time Speech-to-Speech: real time experiences for developers — real generative voice skins, not just filters

Resemble’s feature set around AI detection and copyright processes is instrumental to an enterprise to be able to use their technology without worrying about copyright infringements. Resemble has incorporated workflow frameworks in place to be on the right legal side of Generative AI use cases and their ethical-first approach was a key component of our investment thesis. We are acutely aware of potential misuses of the technology and so is the company. To mitigate risks, Resemble pioneered audio watermarking technology to identify any AI-generated audio (the only platform in market today to be able to do so) and is quickly becoming the key thought leader in proper use and best practices of voice AI. The Company ensures all voices utilized are cleared for any potential copyright issues and has baked appropriate guardrails onto its overall workflow to scale spam-and-fraud-free use. More on their Deep Fake Detector technology here.

In addition, Resemble pursues a SDK/API first distribution strategy, embedding its product in a customer’s existing workflow processes, which not only makes the product sticky but also allows faster onboarding and time-to-value. Once integrated to specific workflow processes of a customer, Resemble’s audio models keep improving, thus continuously improving the quality of the voice.

Users today can sign up for a free trial and get free cloned voices, several of whom move down the funnel to qualified paying customer leads. The freemium inbounds help the Company learn early and quickly on future potential use cases customers want. Given the nascent nature of the space, this gives Resemble an edge on identifying market needs without additional spend in customer discovery. In the last 12 months, over 395 million total API calls have been made to Resemble for programmatic use and 1.1 billion seconds of audio created.

The Company’s go-to-market focus on both PLG and top-down sales gave us confidence that this business could scale quickly. Most competitors in the space pursue bespoke or project-based work, or are focused on a specific feature (text-to-speech or dubbing only for example), a less holistic approach to quickly grow and retain market share.

Resemble’s future vision and growth-to-date has inspired a fantastic investor base to come together for their Series A round. We are excited to lead the round and collaborate with Comcast Ventures, Craft, Ubiquity, and several prominent angels including Julius Genachowski, the former chair of the FCC, Qasar Younis, CEO of Applied Intuition, and Matt Rutler from MasterClass. We look forward to partnering closely with Zohaib and Saqib to help them build a transformational company and become a true category leader to set the tone for responsible uses of this groundbreaking technology for individuals and large enterprises alike.

For more on Resemble, visit their website and Twitter, and check out their careers page here — they are hiring!

- Tasnia Huque and Alex Gurevich

--

--

Javelin VP

Early stage venture capital with the culture of a start-up and the spirit of die hard entrepreneurs