About Experience Skills Projects Education Contact
Open to opportunities · Berlin, Germany

Hi, I'm Sundus

MS Data Science student at FU Berlin. 3+ years building ML systems, neural networks, and data pipelines in Python, PyTorch, and TensorFlow.

Berlin, Germany FU Berlin · MS Data Science EU work authorisation
0+
ML projects built
0+
Years of experience
0
Certifications
Who I am
About

I've always been drawn to the moment a model stops being math and starts being useful — predicting a sale, summarising a meeting, recognising a face in a crowd. That curiosity has carried me through 3+ years of building ML systems and into an MS in Data Science at Freie Universität Berlin, graduating Oct/Nov 2026, now looking for my next role as a Data Scientist, ML Engineer, or AI Engineer.

Right now I'm deep in my thesis at BIFOLD (TU Berlin), working at the intersection of generative AI and physics — bridging machine learning force fields with diffusion models using equivariant GNNs (SchNet/PaiNN) in PyTorch. It's the kind of problem that keeps me reading papers past midnight, and exactly why I want to build a career around ML and LLM engineering.

Outside the thesis, I geek out building things end-to-end: prompting and fine-tuning LLMs, designing agentic workflows, forecasting time-series, training computer vision models, and wiring up the ETL pipelines that quietly hold it all together. Fluent in English (C1), and building my German (A2) one conversation at a time.

Current role
Working Student, ML · Technische Universitat Berlin
Thesis
Transfer Learning for ML Force Fields & Diffusion Models · BIFOLD
Core frameworks
PyTorch · TensorFlow · Hugging Face Transformers · Spark MLlib
Languages spoken
English (C1) · German (A2) · Urdu (native)
Career
Experience
Dec 2024 – Present
TU Berlin
Working Student, ML
Technische Universitat Berlin · Berlin, Germany
  • Developing a generative model that produces new molecular structures, expanding training data for ML interatomic potentials.
  • Designed and built SpkEnsembleCalculator in SchNetPack with uncertainty quantification over energies, forces, and stress tensors for active-learning pipelines.
  • Refactored the core data pipeline, unifying interfaces across 8 molecular datasets (QM9, MD17, MD22, rMD17, MPtrj, ISO17, QM7x, Materials Project).
  • Built pytest test suites across ensemble, data, and atomistic-offset modules for continuous integration.
Oct 2021 – Oct 2023
Lyftrondata
Software Engineer
Lyftrondata · US, Remote
  • Engineered scalable ETL pipelines using Airflow and Kafka, improving processing speed by 30%.
  • Developed custom Python-based connectors for heterogeneous data sources, integrating them into Snowflake and BigQuery pipelines using SQL and PySpark.
  • Created interactive Tableau dashboards to visualise key metrics and support business decisions.
Jul 2020 – May 2021
Xehen
AI Engineer
Xehen · Karachi, Pakistan
  • Built web scrapers for 50+ e-commerce websites using Scrapy, Selenium, and BeautifulSoup, storing structured data in PostgreSQL.
  • Developed a real-time school attendance system using Hugging Face facial recognition and ensemble ML models.
  • Built a 96-class dog breed classifier using YOLOv3 and landmark-based features, achieving 97% accuracy.
Tech stack
Skills
ML / Statistics
PyTorchTensorFlowScikit-learn XGBoostARIMALSTM YOLOv3PaiNNSchNetPack Computer VisionTime-series Feature EngineeringUncertainty Quantification
LLMs / NLP
OpenAI APILangChain Hugging FacePrompt Engineering NLPText SummarisationAgentic Workflows Generative AIRAG
Data Engineering
PandasNumPyApache Airflow KafkaPySparkSpark MLlib SnowflakeBigQueryPostgreSQL MySQLMongoDBSQL
MLOps / Tools
DockerMLflowAWS AzureGitpytest StreamlitTableau AWS QuickSightMatplotlibSeaborn
My work
Projects
Time-seriesARIMA · LSTMAWS
Automated Sales Forecasting System
Built an end-to-end forecasting pipeline combining ARIMA for trend decomposition with an LSTM for sequential pattern capture, surfacing predictions through AWS QuickSight dashboards to support executive sales decisions.
View Details
ForecastingXGBoostStreamlit
Restaurant Footfall Forecasting
Designed a demand prediction system benchmarking ARIMA against XGBoost on real restaurant datasets — XGBoost reduced mean absolute error by ~15%. Delivered as a Streamlit app for weekly staffing and inventory planning.
View Details
LLMLangChainOpenAI
LLM Engineering Suite
Built five production-ready LLM apps — a meeting minutes generator, synthetic data pipeline, AI sales brochure writer, cafe assistant chatbot, and text summariser — using LangChain, OpenAI API, and Gradio across 18+ documented notebooks.
View Details
Face detection and tracking
Computer VisionMTCNN · OCSORT
Face Detection & Tracking Pipeline
Engineered a real-time multi-face pipeline combining MTCNN for sub-pixel-accurate face localisation with OCSORT for persistent identity tracking across video frames, targeting automated attendance and surveillance use cases.
View Details
Cow emotion and posture analysis
Computer VisionDeepLabCut · RNN
Cow Emotion & Posture Analysis
Built a two-stage deep learning pipeline that extracts 16 anatomical keypoints from cow images using DeepLabCut (ResNet backbone), then classifies happy vs. unhappy emotional states via an RNN — enabling automated livestock welfare monitoring at scale.
View Details
Traffic analysis
Computer VisionSmart City
Real-time Traffic Analysis System
Developed a vehicle detection and flow analysis system using deep learning on live traffic footage, extracting lane-level density metrics and classification statistics to support smart city traffic engineering and signal optimisation decisions.
View Details
Osteoporosis detection
Medical AICNN
Osteoporosis Detection from X-rays
Trained a CNN to identify early-stage osteoporosis by learning bone density patterns directly from knee X-ray images, enabling automated screening that assists radiologists and reduces diagnostic turnaround time.
View Details
Surveillance system
Computer VisionYOLOv3 · dlib
Spyeye: Intelligent Surveillance System
Built a multi-modal security system using YOLOv3 for real-time weapon detection (knives, guns) and fight recognition in CCTV footage, combined with dlib-based face recognition — delivering a complete automated threat-alert solution for homes and businesses.
View Details
Academic background
Education
MS in Data Science
Oct 2023 – Oct/Nov 2026
Freie Universitat Berlin · Berlin, Germany
Relevant coursework: Machine Learning, Statistics, Data Management, Data Visualisation, LLMs, Semantic Technologies.
MS Thesis · BIFOLD, TU Berlin
Transfer Learning for Bridging Machine Learning Force Fields and Diffusion Models
Training a multi-head model on equilibrium and non-equilibrium force datasets using SchNet/PaiNN in PyTorch. Benchmarking representation and output-shift conditioning strategies.
BS in Software Engineering
Jan 2017 – Mar 2021
Sir Syed University of Engineering and Technology · Karachi, Pakistan
Graduated 2nd in cohort · Top grade band (1.1, German scale)
Credentials
Certifications
Generative AI with Large Language Models
DeepLearning.AI & AWS
Issued: Dec 2024
No expiration
Credential ID: CU75G8RW4TRX
Generative AI LLMs Fine-tuning RLHF
Verify Credential
TensorFlow Developer Professional Certificate
DeepLearning.AI
Issued: Jul 2023
No expiration
Credential ID: 677438ed8472ac0e
Deep Learning CNNs NLP TensorFlow
Verify Credential
PIAIC Artificial Intelligence
Presidential Initiative for AI & Computing
No expiration
Machine Learning Python AI Deep Learning
Verify Credential
Machine Learning with PySpark
DataCamp
No expiration
PySpark Spark MLlib Big Data ML
Verify Credential
Python Programming Introduction
Microsoft
Issued: Apr 2024
No expiration
Credential ID: 5GKSBZWSQS4M
Python Programming SQL
Verify Credential

Let's connect

Open to entry-level Data Scientist, ML Engineer, and AI Engineer roles. Also happy to discuss research collaborations and AI-focused projects.