GCP Data Engineer with ML
Role : GCP Data Engineer with ML knowledge Location : Remote (Preferably NY/NJ) Contract
Key Responsibilities
Pipeline Development & ETL: Design and deploy robust batch and streaming data pipelines using Cloud Dataflow (Apache Beam) and Cloud Pub/Sub. Data Modeling & Warehouse: Construct and optimize data models in BigQuery for high-performance analytics and ML model consumption. MLOps & Deployment: Operationalize ML models developed by data scientists, transitioning models from experimentation to production environments using Vertex AI. Feature Engineering: Collaborate with data scientists to implement feature engineering pipelines that automate the extraction of features from raw data for training. Data Security & Quality: Implement data governance, privacy, and security best practices (IAM, Data Loss Prevention) throughout the data lifecycle. Automation: Automate data workflows and orchestration using Cloud Composer (Apache Airflow). Monitoring & Optimization: Monitor pipeline performance using Cloud Monitoring and optimize for cost and speed.
Required Qualifications
Experience: 3-5+ years of experience in data engineering, with at least 2+ years focused on GCP. Programming Skills: Expert-level SQL and strong Python programming skills. GCP Expertise: Proven experience with Cloud function, Cloudrun, GCE, GKE, BigQuery, Dataflow, Dataproc, pub-sub, Google Cloud Storage, and Vertex AI. Programming Skills: Expert-level SQL and strong Python programming skills. ML Knowledge: Understanding of machine learning fundamentals (training, testing, evaluation, drift) and feature engineering techniques. Strong understanding of SQL and unstructured data management. Hand-on experience with Docker, Kubernetes (GKE), and CI/CD tools. Infrastructure as Code: Experience with Terraform to provision and manage infrastructure. Education: Bachelor's degree in Computer Science, Engineering, or a related field.
Preferred Qualifications
Certification: Google Cloud - Professional Data Engineer Certification. MLOps Specialization: Experience with Kubeflow or Vertex AI Pipelines. Data Modeling: Strong understanding of data warehouse modeling patterns (Kimball/Inmon). Key Technologies GCP Core: Cloud function, Cloudrun, BigQuery, Dataflow, Pub/Sub, Composer, Dataproc, Vertex AI. Languages: Python, SQL Frameworks: Apache Beam, Apache Spark. Tools: Terraform, Git, Docker, Kubernetes. Apply tot his job Apply To this Job