Skip to content
View wyang10's full-sized avatar

Highlights

  • Pro

Block or report wyang10

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
wyang10/README.md

Hi there πŸ‘‹ I’m Audrey~ πŸš€

About Me 🌱

I'm a Cloud Data Engineer focused on building scalable, reliable, and cost-efficient cloud data platforms.
I specialize in turning raw, messy, multi-source data into trusted analytics layers and ML-ready pipelines
through a mix of modern ELT, streaming systems, and strong distributed systems fundamentals.


Quick Pitch πŸ’¬

πŸŽ“ MSCS @ Northeastern University (2022–2024)
☁️ Focus: Cloud-Native Data Engineering β€” streaming(Kafka/Flink), orchestration(Airflow/Dagster), modeling(dbt)
πŸ”— Connect: GitHub: wyang10 β€’ LinkedIn: linkedin.com/in/awhy


Highlights πŸ’‘

  • Cloud Data Systems: Airflow β€’ dbt β€’ Snowflake β€’ BigQuery β€’ Terraform
  • Streaming Architecture: Kafka/Flink β€’ stateful processing β€’ exactly-once pipelines
  • Distributed Systems: idempotency β€’ back-pressure β€’ partitioning strategies
  • ELT/ETL Optimization: incremental models β€’ data quality β€’ orchestration best practices
  • Feature Engineering: online/offline store design β€’ feature pipelines

Experience 🧩

Data Engineer β€” LumiereX (Jan 2025 – Present)
Built core ELT frameworks, improved data quality layers, and optimized Spark jobs for cost/performance.
Designed cloud-native data pipelines supporting analytics and ML-driven decisions.

Software Engineer Intern β€” VisionX (Jan 2024 – Jul 2024)
Implemented scalable ingestion APIs, automated batch ETL workflows,
and contributed to the design of ML feature extraction pipelines.


Featured Projects πŸ‘¨β€πŸ’»

A production-ready ELT & Data Quality Framework using Airflow + dbt + Great Expectations + CICD.
Automates data ingestion, transformation, testing, and lineage into a reproducible orchestration system.

End-to-End, Reproducible ML Pipeline Engineered a modular, production-style ML system for predicting in-hospital mortality. Go from raw CSV β†’ cleaned features β†’ baseline models β†’ reproducible CLI pipeline, with optional SMOTE to address severe class imbalance.


How I Work πŸ‘―

  • I design modular, observable pipelines that are easy to test, debug, and scale.
  • I prioritize trade-offs that maximize team velocity, reliability, and cloud spend efficiency.
  • I enjoy collaborations involving data modeling, pipeline quality, and distributed system design.

Core Skills ⚑

Languages & Tools
Python (Pandas, PySpark) β€’ SQL β€’ Java β€’ Scala β€’ Bash

Cloud & Orchestration
GCP (BigQuery, Dataflow) β€’ AWS (S3, EMR, Glue, Lambda) β€’ Airflow β€’ Dagster β€’ dbt β€’ Docker
Kubernetes β€’ GitHub Actions β€’ Terraform

Big Data & Storage
Spark β€’ Flink β€’ Kafka β€’ Databricks β€’ Hive β€’ HDFS
Snowflake β€’ Delta Lake β€’ Parquet β€’ dimensional modeling

Data Quality & CI/CD
Great Expectations β€’ dbt tests β€’ automated lineage β€’ monitoring


πŸ˜„ Thanks for stopping by! πŸ‘‹

Pinned Loading

  1. Macro-Market-Intelligence-Pipeline Macro-Market-Intelligence-Pipeline Public

    Developed an automated macro intelligence pipeline integrating prediction market data (Polymarket) and macro indicators for real-time decision summaries.

    Python 1

  2. airflow_dbt_demo airflow_dbt_demo Public

    Airflow + dbt + Snowflake + Postgres + Docker + CICD (Postgres‑backed)

    HTML 1

  3. Android_WeatherFinder Android_WeatherFinder Public

    A simple Android app to search weather by city name and display real‑time weather info (city, country, description, temperature

    Java 1

  4. Openai-DBAuctionSystem Openai-DBAuctionSystem Public

    DBAuctionSystem β€” Furniture Auction Platform (Django + MySQL + Streamlit)

    JavaScript 1

  5. Smote-Heart-Attack-ML Smote-Heart-Attack-ML Public

    A Modular, Production-Style ML Pipeline with Class-Imbalance Handling

    Jupyter Notebook 1

  6. AI-Photo-Generator AI-Photo-Generator Public

    End-to-end system for generating compliant ID photos from user uploads, featuring a production-style workflow from raw images β†’ segmentation/matting β†’ face-aligned cropping β†’ background synthesis →…

    JavaScript 1