Hello,
Please find the JD below and let me know your interest with an updated resume.
Data Scientist
Guadalajara, MX Remote and Onsite all 5 days – Mexico
6-12 Months Contract
Position requirements Guadalajara, MX – 5 days in the Office - To start, may be remote, but in 3-6 weeks will be Onsite
Job Description
Responsibilities
- Design and build end-to-end data pipelines : ingestion, transformation, labeling, feature engineering, training, validation, and deployment.
- Develop and maintain reproducible ML research workflows using Jupyter / Colab notebooks and version-controlled environments.
- Lead data labeling strategy : define annotation guidelines, ensure data quality, and automate labeling where possible.
- Conduct exploratory data analysis (EDA) and create scalable datasets for supervised and unsupervised ML.
- Train, evaluate, and optimize ML models, ensuring performance across multiple environments (dev, staging, prod).
- Implement MLOps best practices : experiment tracking, pipeline automation, CI / CD for ML, and monitoring model drift.
- Collaborate with cross-functional teams (ML engineers, software developers, product managers) to translate research outcomes into production-ready solutions.
- PhD in Computer Science, Data Science, Statistics, Applied Mathematics, or related field.
- 3+ years of research and applied industry experience in data science or ML.
- Proven experience in designing, building, and maintaining end-to-end ML pipelines.
- Strong programming skills in Python, with expertise in libraries like pandas, NumPy, scikit-learn, TensorFlow / PyTorch.
- Proficiency with notebook-based experimentation and data labeling pipelines.
- Solid understanding of statistical modeling, machine learning algorithms, and data quality management.
- Experience with SQL and data warehouses / lakes for large-scale data processing.
- Hands-on experience with MLOps tools (MLflow, Kubeflow, Airflow, Vertex AI, SageMaker).
- Familiarity with cloud platforms (GCP, and containerization (Docker / Kubernetes).
- Experience automating data labeling workflows (e.g., active learning, weak supervision).