Leo Torres

Leo Torres

Lead Data Scientist & Full-Stack Developer

LLMs & AI Python BigQuery Leipzig/Berlin
Download Resume Full technical experience
Schedule Call Let's discuss your project

About

I'm a Lead Data Scientist at FGS Global, building LLM-powered data pipelines that process 1M+ news articles daily for Fortune 500 clients. Based in Leipzig, Germany (commuting to Berlin), I specialize in large-scale AI systems, distributed data processing, and full-stack development.

I architect data infrastructure handling billion-row datasets with sub-second query performance using BigQuery, Python, and modern cloud technologies. As tech lead for our flagship internal product, I manage a team of 6 engineers while implementing cutting-edge RAG architectures and vector databases.

My approach combines rigorous computer science fundamentals with practical engineering solutions. Whether it's reducing processing time by 10x through optimized pipelines or building robust APIs that serve multiple products, I deliver measurable impact at scale.

Technical Expertise

Core Competencies

Full-stack development with a focus on backend systems, data engineering, and ML infrastructure. Experienced in taking projects from prototype to production.

Python JavaScript/TypeScript Distributed Systems Machine Learning Data Pipelines API Design Cloud Architecture Performance Optimization

Featured Projects

FGS Global — MediaIQ

Lead Data Scientist and Tech Lead for flagship internal product processing 1M+ news articles daily for Fortune 500 clients. Built LLM-powered pipelines with RAG architecture, managing team of 6 engineers while architecting infrastructure handling 1B+ log entries.

Impact: Reduced processing time by 10x through optimized data pipelines, enabling real-time insights for enterprise clients across multiple industries.

Python • FastAPI • BigQuery • GCP • LLMs • RAG • Vector Databases

The Aris Program

Founder and Lead Developer of next-generation academic publishing platform. Building open-source tools for collaborative scientific writing with real-time version control. Full-stack architecture using Python, FastAPI, Vue.js, and HTMX.

Vision: Revolutionizing how researchers collaborate and publish, launching 2025 with focus on transparency and reproducibility.

Python • FastAPI • Vue.js • HTMX • PostgreSQL • Docker • Netlify

XGI: Complex Group Interactions

Co-Lead Developer of Python library for analyzing higher-order networks and hypergraphs. NumFOCUS affiliated project with growing academic user base. Implemented core algorithms, designed API, and established comprehensive testing framework.

Technical leadership: OOP design, CI/CD with GitHub Actions, performance optimization with NumPy and Numba.

Python • NumPy • pandas • Numba • pytest • GitHub Actions • OOP

Manim Community

Organization Owner and Core Developer of the community-maintained version of 3Blue1Brown's mathematical animation engine. Contributing to the open-source Python library that creates precise, programmatic mathematical visualizations and educational content.

Recognition: Featured in GitHub's Popular Python Repositories. Contributions: Algorithm implementations, performance optimizations, documentation improvements, and community support for mathematical animation workflows.

Python • Mathematical Visualization • OpenGL • Cairo • Community Development

COVID-19 Mobility Data Pipeline

Engineered data pipeline processing mobility data for 300+ US cities during COVID-19 pandemic. Built ETL workflows using Apache Airflow, implemented data quality checks, and optimized geospatial queries with PostGIS.

Results: Enabled epidemiologists to analyze movement patterns in near real-time, contributing to public health policy decisions.

Python • Airflow • Pandas

Industry Experience

Lead Data Scientist & Tech Lead

May 2023 - Present

FGS Global

Built LLM-powered pipelines processing 1M+ news articles daily for Fortune 500 clients

Architected data infrastructure handling 1B+ log entries with sub-second query performance

Tech lead for flagship internal product, managing team of 6 engineers

Implemented RAG architecture and vector databases for domain-specific information retrieval

Research Intern

May 2019 - Jul 2019

Yahoo! Research

Built graph representation learning models on Tumblr social network data

Processed terabyte-scale datasets using PySpark and distributed computing

Developed Python pipelines for large-scale network analysis

Research Programmer

2012 - 2014

Wolfram Research South America

Developed data pipelines for the Wolfram|Alpha knowledge engine

Owned specific data domains end-to-end, including ingestion and quality

Worked in a remote, globally distributed team

Open-Source Maintainership

Co-Lead Developer — XGI

Aug 2021 - Present

NumFOCUS-affiliated Python library for higher-order networks

Designed public API, core algorithms, and CI/CD; performance work with NumPy and Numba

Library adopted by researchers across academia and industry

Organization Owner & Core Developer — Manim Community

May 2020 - May 2021

Community-maintained mathematical animation engine (3Blue1Brown)

Featured in GitHub's Popular Python Repositories; grew project from fork to active community

Algorithm implementations, performance work, release management, contributor onboarding

Co-Lead Developer — netrd

Jan 2019 - Jul 2019

Library for network reconstruction and comparison (JOSS-published)

Implemented 40+ algorithms; coordinated 12+ contributors; set coding standards

Reviewer — Journal of Open Source Software

Jul 2020 - Present

Peer review for scientific software submissions

Research Engineering & Academia

Postdoctoral Fellow — Mathematics

Aug 2021 - May 2023

Max Planck Institute for Mathematics in the Sciences

Spectral graph theory research applied to complex networks

Implemented high-performance graph mining tools in Python alongside published research

PhD, Network Science

2016 - 2021

Network Science Institute, Northeastern University

Dissertation: Spectral Aspects of Mining Complex Networks

Developed open-source Python libraries used by the research community

Technical Skills

Languages & Frameworks

  • Python: NumPy, Pandas, SciPy, PyTorch, FastAPI, Django, Celery
  • JavaScript: Node.js, React, TypeScript, D3.js, Express
  • Systems: C++, Rust (learning), Go (basic)
  • Other: SQL, GraphQL, Shell scripting, LaTeX

Infrastructure & Tools

  • Cloud: AWS (EC2, S3, Lambda, SageMaker), GCP, Azure
  • Databases: PostgreSQL, MongoDB, Redis, Neo4j, TimescaleDB
  • DevOps: Docker, Kubernetes, Terraform, GitHub Actions, CircleCI
  • Monitoring: Prometheus, Grafana, ELK Stack, Datadog

Methodologies & Practices

  • Architecture: Microservices, Event-driven, REST/GraphQL APIs
  • Development: TDD, CI/CD, Code review, Pair programming
  • Data: ETL pipelines, Stream processing, Data modeling
  • ML Ops: Model versioning, A/B testing, Feature stores

Soft Skills

  • Technical leadership and mentoring
  • Cross-functional collaboration
  • Technical documentation and knowledge sharing
  • Remote team coordination (5+ years)