Software Engineer, Data Processing & Privacy - 26-00382
Additional Notes: Data Privacy and legal environments, working in Python & with Claude, handling/processing PII; Soft skills: attention to detail, reliable, good with reviews/audits. About the role
- Client is seeking a detail-oriented Software Engineer on a contract basis to build and run data processing pipelines for datasets used in our research. You'll take raw, heterogeneous inputs — text, code, documents, structured exports — and turn them into clean, well-structured, privacy-safe outputs ready for downstream use.
- The work spans ingestion, format normalization, data quality, privacy handling (including PII de-identification), and the supporting tooling that makes the pipeline reliable and self-serve. You'll iterate closely with internal teams on QA findings and harden the pipeline so each new dataset is cheaper than the last.
Responsibilities
- Build and extend per-source processing for new data types as they arrive
- Ingest and normalize raw exports across many formats into consistent, well-structured outputs
- Handle privacy requirements — for example, PII detection and de-identification — to meet our internal compliance bar
- Run data quality QA: automated checks plus LLM-assisted review to flag gaps, malformed inputs, and incompleteness
- Iterate on internal feedback: root-cause issues, fix, re-run, re-deliver
- Build supporting tools: auditing, data exploration, monitoring, simple search over processed data
- Land cleaned data with the right storage layout and access controls
- Document and harden the pipeline so each new dataset is cheaper than the last
You may be a good fit if you
- Have 4+ years of software engineering experience, with substantial time on data pipelines
- Are a proficient user of Claude / Claude Code for day-to-day engineering and know when to verify its output
Are genuinely detail-oriented
- Have high integrity and take handling real people's personal data seriously
- Are comfortable with sustained, careful data work and find satisfaction in getting it right
- Can work independently, ship reliably, and communicate clearly about progress and edge cases
- Are proficient in Python and comfortable working across many heterogeneous, semi-structured formats (JSON, NDJSON, code, HTML/XML dumps, archives)
- Strong candidates may also have experience with
- PII detection and anonymization techniques
- Working with large, messy, semi-structured text and code corpora
- Data quality monitoring and validation
- Cloud storage and access-control patterns (S3/GCS, IAM)
- Building internal tools or self-serve data platforms for researchers
- Information retrieval, search, or RAG systems.
Apply tot his job Apply To this Job