hero

Build ventures that help people flourish.

Learn
companies
Jobs

Audio AI Specialist (Speech Data, Quality & Annotation)

HireArt

HireArt

Software Engineering, Data Science, Quality Assurance
Springfield, VA, USA · Remote
USD 25-25 / hour
Posted on Mar 23, 2026

Apply to A Fast-growing Startup Building Safer, More Reliable AI

Audio AI Specialist (Speech Data, Quality & Annotation)

A Fast-growing Startup Building Safer, More Reliable AI

Location

United States (Remote)

Work environment

Remote

Expected pay amount

25.00 USD Per Hour

Schedule

(N/A)

Assignment length

Contract

Job description

HireArt is helping our client find an Audio AI Specialist (Speech Data, Quality & Annotation) to support the development of high-quality training datasets for next-generation voice AI models.

In this role, you’ll work hands-on to improve the quality, consistency, and usability of speech datasets across applications such as text-to-speech, transcription, speech-to-speech, ASR, and conversational voice systems. Your work will directly influence how data is collected, reviewed, and delivered for real-world model training.

You will work across three core areas: defining and applying audio quality standards, recording high-quality speech on demand, and performing annotation and QA across speech datasets. This is not a generic audio production role, this work focuses on making audio usable for model training and requires a strong understanding of how data quality impacts model performance.

The ideal candidate has direct experience working with audio AI datasets and understands what makes speech data effective for model training. You have a strong ear for audio quality, are comfortable applying annotation standards, and can consistently produce and evaluate high-quality recordings.

As an Audio AI Specialist, you'll:
  • Develop, refine, and apply audio quality guidelines for speech and voice datasets.
  • Review audio files against technical, linguistic, and task-specific standards, making clear approval, rejection, or revision decisions.
  • Identify audio and annotation issues such as background noise, clipping, distortion, plosives, echo, low signal,
  • segmentation errors, transcript mismatches, and speaker-label inconsistencies.
  • Perform annotation and QA tasks, including transcription, timestamp validation, VAD/segmentation, diarization,
  • pronunciation checks, and metadata review.
  • Record speech based on provided scripts and performance guidelines, delivering natural, high-quality, specification-compliant audio.
  • Document edge cases, update review rubrics, and improve internal SOPs and quality standards.
  • Collaborate with research, ML, and operations teams to translate model requirements into data specifications and evaluation criteria.
  • Ensure consistency and integrity across audio files, transcripts, annotations, and associated metadata.

Requirements

  • Direct experience working with audio AI training datasets or evaluation workflows
  • Hands-on experience with TTS, ASR, transcription, speech-to-speech, or related voice AI systems
  • Experience developing or applying audio quality standards in production environments
  • Experience with speech annotation tasks such as transcription, timestamp QA, VAD/segmentation, and diarization
  • Strong auditory judgment with the ability to consistently identify subtle audio quality issues
  • Ability to produce high-quality recordings in a controlled, quiet environment using professional or near-professional equipment
  • Strong written communication skills with the ability to provide clear, actionable feedback
  • High attention to detail and sound judgment when evaluating edge cases
  • Comfortable working with structured data formats such as spreadsheets, CSV, or JSON

Bonus Qualifications:
  • Experience with audio tools such as Audacity, Praat, or similar
  • Basic scripting skills (Python, Bash, or SQL) for QA or dataset analysis
  • Background in linguistics, phonetics, speech research, or voiceover work
  • Experience evaluating both real and synthetic audio
  • Multilingual experience or familiarity with accents and dialect variation
  • Familiarity with compliant handling of consented and licensed voice data

Commitment: This is an ongoing contract position staffed via HireArt. It will be remote and available to U.S.-based candidates (excluding California and Illinois).

HireArt values diversity and is an Equal Opportunity Employer. We are interested in every qualified candidate who is eligible to work in the United States. Unfortunately, we are not able to sponsor visas or employ corp-to-corp.

Job description

HireArt is helping our client find an Audio AI Specialist (Speech Data, Quality & Annotation) to support the development of high-quality training datasets for next-generation voice AI models.

In this role, you’ll work hands-on to improve the quality, consistency, and usability of speech datasets across applications such as text-to-speech, transcription, speech-to-speech, ASR, and conversational voice systems. Your work will directly influence how data is collected, reviewed, and delivered for real-world model training.

You will work across three core areas: defining and applying audio quality standards, recording high-quality speech on demand, and performing annotation and QA across speech datasets. This is not a generic audio production role, this work focuses on making audio usable for model training and requires a strong understanding of how data quality impacts model performance.

The ideal candidate has direct experience working with audio AI datasets and understands what makes speech data effective for model training. You have a strong ear for audio quality, are comfortable applying annotation standards, and can consistently produce and evaluate high-quality recordings.

As an Audio AI Specialist, you'll:
  • Develop, refine, and apply audio quality guidelines for speech and voice datasets.
  • Review audio files against technical, linguistic, and task-specific standards, making clear approval, rejection, or revision decisions.
  • Identify audio and annotation issues such as background noise, clipping, distortion, plosives, echo, low signal,
  • segmentation errors, transcript mismatches, and speaker-label inconsistencies.
  • Perform annotation and QA tasks, including transcription, timestamp validation, VAD/segmentation, diarization,
  • pronunciation checks, and metadata review.
  • Record speech based on provided scripts and performance guidelines, delivering natural, high-quality, specification-compliant audio.
  • Document edge cases, update review rubrics, and improve internal SOPs and quality standards.
  • Collaborate with research, ML, and operations teams to translate model requirements into data specifications and evaluation criteria.
  • Ensure consistency and integrity across audio files, transcripts, annotations, and associated metadata.

Requirements

  • Direct experience working with audio AI training datasets or evaluation workflows
  • Hands-on experience with TTS, ASR, transcription, speech-to-speech, or related voice AI systems
  • Experience developing or applying audio quality standards in production environments
  • Experience with speech annotation tasks such as transcription, timestamp QA, VAD/segmentation, and diarization
  • Strong auditory judgment with the ability to consistently identify subtle audio quality issues
  • Ability to produce high-quality recordings in a controlled, quiet environment using professional or near-professional equipment
  • Strong written communication skills with the ability to provide clear, actionable feedback
  • High attention to detail and sound judgment when evaluating edge cases
  • Comfortable working with structured data formats such as spreadsheets, CSV, or JSON

Bonus Qualifications:
  • Experience with audio tools such as Audacity, Praat, or similar
  • Basic scripting skills (Python, Bash, or SQL) for QA or dataset analysis
  • Background in linguistics, phonetics, speech research, or voiceover work
  • Experience evaluating both real and synthetic audio
  • Multilingual experience or familiarity with accents and dialect variation
  • Familiarity with compliant handling of consented and licensed voice data

Commitment: This is an ongoing contract position staffed via HireArt. It will be remote and available to U.S.-based candidates (excluding California and Illinois).

HireArt values diversity and is an Equal Opportunity Employer. We are interested in every qualified candidate who is eligible to work in the United States. Unfortunately, we are not able to sponsor visas or employ corp-to-corp.