cv

Data scientist skilled in end-to-end problem-solving, from framing challenges to production deployment. Proficient in building robust data pipelines and training machine learning models with a focus on NLP and LLMs.

General Information

Full Name Phanindra Kalaga
Email phanindra.connect@gmail.com
Phone +1-771-233-3129
Website phanindra-max.tech
Languages English, Telugu, Hindi

Professional Summary

Summary Data scientist skilled in end-to-end problem-solving, from framing challenges to production deployment. Proficient in building robust data pipelines with Python and SQL, and training machine learning models with a focus on NLP and LLMs. Experienced in deploying models as scalable APIs, creating decision-ready dashboards, and shipping open-source software with modern CI/CD practices. Comfortable working across the full data stack, from backend services to frontend interfaces.

Education

  • 2024 Present
    Master of Science in Data Science
    George Washington University, Washington, D.C.
    • GPA 3.90
    • Relevant Coursework: Machine Learning, NLP, Data Mining, Computer Vision, Cloud Computing
  • 2019 2023
    Bachelor of Technology in Computer Science and Engineering
    Jawaharlal Nehru Technological University, India

Experience

  • April 2024 Present
    Lead Graduate Research Specialist, LAiSER
    George Washington University, Washington, D.C.
    • Open Source: Built an open-source NLP package for skills and workforce analytics; co-authored paper submitted to the Journal of Open Source Software (JOSS).
    • Data Engineering: Designed scalable pipelines for skill extraction from 1M+ job descriptions; achieved 7x throughput using vLLM on HPC clusters.
    • Data Visualization: Built Neo4j and Plotly 3D maps to show skill gaps across 1,000+ university courses for the Texas Workforce Commission; work informed two local training policies.
    • Project Management: Ran weekly SCRUM; delivered 5 milestones on or ahead of schedule using GitHub Actions and issue tracking.
    • Dev Environment: Standardized the analytics environment on AWS and authored shared Python libraries for the team.
    • Data Ethics: Authored data-ethics guidelines; eliminated 100% of PII from public datasets.
  • Aug 2024 Present
    Graduate Assistant, Natural Language Processing Class
    George Washington University, Washington, D.C.
    • Automation: Built a Python autograding toolkit; cut grading turnaround from 3 days to 2 hours across 10 assignments (~5 minutes per submission), saving ~100 TA hours and reducing regrade requests to 0.
    • Cloud Runtime: Standardized the environment on AWS EC2 with a CUDA/PyTorch AMI; configured S3 for artifacts, IAM for access, and CloudWatch for monitoring.
    • Course Dashboards: Built instructor dashboards and learner survey tools to track assignment and activity patterns.
    • Tech: Python, pandas, NumPy, scikit-learn, NLTK, Hugging Face, Matplotlib, OpenPyXL, AWS (EC2, S3, IAM, CloudWatch), CUDA, PyTorch, Blackboard.
  • Aug 2024 Dec 2024
    Data Science Intern, Data Science for Sustainable Development
    George Washington University, Washington, D.C.
    • Full-Stack Engineering: Led a team of 5 to design and deploy an open-access repository of 200+ non-profits (Next.js, Flask, Supabase on AWS EC2/S3).
    • Data Analysis: Designed Figma wireframes and embedded n8n AI agents for admin insights across 50+ projects; increased partner collaboration by 30%.
  • July 2021 Jan 2023
    Full-Stack Engineer
    NashAgri, An Agritech B2C Organization, Maharashtra, India
    • Product: Built a cross-platform CRM from scratch; added ML analytics to improve operational efficiency.
    • Geospatial: Managed geospatial data for 45,000+ farmers across 25,000+ regions (MongoDB, GeoJSON); generated insights for partnerships.
    • Backend: Engineered 5+ core features (invoice generation, RBAC, real-time auctions) supporting a 40% increase in vendor transactions in the first quarter.

Projects

  • 2024
    Reinforcement Learning for Pseudo-Labeling
    • Model agnostic, custom RL environment for data annotation to outperform state-of-the-art semi-supervised techniques and enable reproducible evaluation.
    • Tools: OpenAI Gymnasium, High Performance Compute (HPC), Tensorboard, PyTorch (torchvision)
  • 2024
    Benchmarking Database Architectures for Network Analytics
    • Reproducible Python benchmark of MySQL, MongoDB, and Neo4j on a synthetic road network dataset to deliver recommendations on which database performs best for each query type.
    • Tools: Python, MySQL, MongoDB, Neo4j (graph algorithms), SQL recursive CTEs, MongoDB aggregation/lookup, psutil (performance monitoring), Faker
  • 2024
    Global CO₂ Emissions Analysis Dashboard
    • Built a dashboard analyzing 223 years of CO₂ data across countries, sectors, and income groups to identify relation between historical emissions and renewable adoption priorities.
    • Tools: Tableau, Python, pandas, Plotly, Matplotlib
  • 2023
    My Own Medic
    • End-to-end open-source AI architecture for a medical assistant leveraging EHR records and LLMs to prevent medical errors; Project selected for presentation at UN Open-Source Week 2025, New York.
    • Tools: MySQL, PHP, Hugging Face API, Data Compliance (HIPAA)

Leadership & Developer Relations

  • 2024 Present
    President, Google Developer Groups On-Campus at GWU
    • Led 200+ student organization for AI/ML awareness via events and workshops; cohosted DevFestDC (900 attendees); cohosted DevFest Annapolis (300 attendees)
  • 2024 Present
    Chairperson of Relations, Data Science Association
    • Cultivated relationships with C-suite executives and senior engineers for hosting events to provide career insights to 170 data science students

Presentations and Awards

  • October 2025
    • DC Startup and Tech Week: Hosted a technical workshop for 60+ startup founders
  • July 2025
    • Badge Summit at CU Boulder: Hosted a table talk with 50 leaders in higher-education research
  • May 2025
    • GWU Open Source Conference: Delivered a keynote session on open-source software best practices
  • January 2025
    • GWU Open-Source Student Awards: Received 3rd prize for best open-source software

Technical Skills

  • Languages
    • Python, SQL, JavaScript, R
  • ML/AI
    • NLP, LLMs, vLLM, PyTorch, scikit-learn, Hugging Face
  • Data & Infrastructure
    • Pandas, NumPy, Spark, Airflow, Docker
  • Cloud Platforms
    • AWS (EC2, S3, IAM, CloudWatch), GCP
  • Databases
    • PostgreSQL, MySQL, MongoDB, Neo4j
  • Web & Serving
    • FastAPI, Flask, Next.js, React
  • MLOps & DevOps
    • Git, GitHub Actions, CI/CD, MLFlow
  • Visualization
    • Plotly, Matplotlib, Tableau