[Remote] Data Engineer
Note: The job is a remote job and is open to candidates in USA. INSPYR Solutions is seeking a Data Engineer to join their Data Operations Team, supporting machine learning and AI initiatives. The role involves designing and managing data pipelines, ensuring high-quality datasets for various projects while collaborating with a responsive team.
Responsibilities
- Work on 3–4 projects to start, scaling up to 6–10 during peak season
- Contribute to data collection, annotation, and generation pipelines using Python and distributed systems (Spark)
- Collaborate with a tight-knit and highly responsive team, engaging in biweekly check-ins with team leads
- Gain experience with confidential, multimodal, and LLM-related datasets across a high volume of AI/ML projects
- Influence how large-scale datasets are prepared for training models across an enterprise AI org
Skills
- 2+ years of experience in data engineering or Python development, with a strong foundation in Computer Science or Data Science
- Proficiency in distributed systems (e.g., Spark), and solid understanding of multithreading vs. multiprocessing
- Demonstrated ability to design scalable pipelines, handle diverse data structures, and manage large-scale workflows
- Comfortable operating under pressure, context-switching across multiple projects, and working with ambiguity
- Strong communication and attention to detail are essential
- Must be available to work 9am–4pm PST, even if located in a different state
- Familiarity with Airflow, Spark, or Flask for scalable API/UI development
- Experience with Docker, containerization, and CI/CD tools (e.g., Jenkins)
- Exposure to LLMs, multi-modal data, or generative AI workflows
- Prior involvement in designing tools to automate or scale ML data pipelines
- Ability to collaborate in a high-volume, high-trust environment
Company Overview
Company H1B Sponsorship