Work
|
XPeng Motors
Guangzhou, China
Big Data Architect - platform team
Jun 2021 - Present
-
Built a real-time autonomous driving data processing platform that provides comprehensive
data processing services for domestic and international collection vehicles and user
vehicles. The platform handles various types of data, including images, videos, LiDAR, and
Canbus, processing over 60TB of data on average per day while ensuring processing time is
less than 12 hours. The platform efficiently handles data ingestion, cleaning, filtering,
and anonymization from multiple data sources, sensors, and scenarios, ensuring downstream
users can quickly access accurate and compliant data. Committed to providing efficient and
reliable data processing solutions for users.
-
The data collected from collection-vehicles serves as cold-start data for the perception
team to establish a foundational model. On the other hand, user-vehicles remotely trigger
events when they encounter corner cases or failures of NGP(Navigate on Autopilot) that
require user intervention. These trigger events capture the scene at that precise moment and
encompass images, videos, LiDAR, Canbus data, text logs, and DDS data results from the
vehicle
model output. The recorded data is then transmitted to the cloud server.
Within the cloud, the data processing platform injects this data into a simulation system,
enabling the perception team to reconstruct and comprehend the scenarios. This process
effectively tackles the long-tail effect of perception models and significantly contributes
to enhancing KPIs
-
Built a cloud-based pipeline utilizing Unreal Engine (UE) for the generation of synthetic
data. This pipeline efficiently generates a substantial amount of relevant datasets,
focusing on specialized corner cases. By complementing real-world collected data, it
effectively enhances model KPIs while simultaneously reducing the overall costs associated
with data collection.
-
Developed the company's first image and video data anonymization system, implemented
both on the cloud and vehicle side, and applied it to data collection tasks both
domestically and internationally. This system plays a crucial role in protecting user
privacy by ensuring sensitive data undergoes appropriate handling and protection(1 patent,
company's quality project award)
-
By leveraging CDC(Change Data Capture) technology, aggregate data from various business
systems, including data pipelines, map POIs(Points of Interest), annotation platforms, and
full-stack testing management, ensuring compatibility with multiple data types. We then
analyze
and refine the aggregated data to provide users with valuable insights.
-
As a Technical Lead, I have been involved in designing parts of the data flow architecture,
optimizing data quality, and reducing costs. My role includes project delivery, coordinating
team resources, and minimizing technical debt.
- Keywords: Autonomous Driving Big Data Processing
Grab
Singapore
Big Data Architect - traffic team
Jun 2019 - Jun 2021
-
Established a real-time geospatial traffic flow platform that covers over 100 cities
in Southeast Asia by integrating real-time location data from drivers and delivery riders.
This platform provides real-time traffic conditions, facilitates optimal route planning,
estimates arrival time (ETA), and offers fare prediction services for our ride-hailing and
food delivery core businesses
-
Performed a major reconstruction of the real-time traffic computation system by
modularizing and separating each component. Built up the capability to quickly
integrate third-party map data sources, such as Google Maps, Here Maps, and SKT Maps.
Additionally, implemented the entire continuous delivery process based on
Kubernetes, ensuring efficient and streamlined deployment of updates and new features
-
By decoupling offline data from real-time computation through source modularization, the
system's complexity and operational costs were reduced. This implementation introduced
real-time aggregation of vehicle GPS data and watermark calculations, leading to enhanced
ETA accuracy and fair billing
- Keywords: Traffic Flow Big Data Processing
Tencent
Shenzhen, China
Big Data Engineer - platform team
May 2018 - Jun 2019
-
Introduced automated sharding and indexing with hierarchical splitting, resulting in a 30%
improvement in log indexing speed for the game log retrieval platform, facilitating faster
data flow for user data
-
Designed a unified interface framework that enables routing, migration, hierarchical
structuring, scaling, monitoring, and fault recovery of game data across different database
components, reducing user complexity
-
Developed a management system for multiple database components, enhancing the efficiency of
automated database management and operations
-
Open-sourced a relatively independent and comprehensive Elasticsearch automation monitoring
dashboard
- Keywords: Gaming Big Data Storage
JD.com
Beijing, China
Big Data Engineer - ads quality team
Jul 2015 - Apr 2018
-
Designed an intelligent advertising bidding platform that incorporates a Multi-Touch
Attribution(MTA) model and confidence intervals, addressing the issue of sparse advertising
data across different dimensions. This platform enables advertisers to easily and
intelligently place advertisements, resulting in a 5% increase in Return on Investment(ROI)
for both the platform's internal and external pages(1 patent, company's gold project award)
-
Implemented a multi-dimensional valuation system that helps advertisers achieve automatic
differentiated bidding across various channels, allowing for precise allocation of their
advertising budgets.
-
Built a product retrieval system that integrates with WeChat's social users, achieving a
monthly Gross Merchandise Volume(GMV) of millions and attracting new users.
- Keywords: Advertising Big Data Retrieval
|