MINH TRAN

I am a graduate student of the Master of Science in Computer Vision at Carnegie Mellon University, advised by Prof. Matthew P. O'Toole, where I work on imaging, camera and machine learning. For my capstone project at Meta Reality Lab , I was advised by Prof. Deepak Pathak, He Wen and Yuan Dong, along with my partner Ashwin Vaswani.

Prior to CMU, I've worked at Actuate AI as a Data Scientist for more than 3 years. Prior to Actuate, I've earned my degree in Data Science from DePaul University, where I was advised by Prof.Jacob Furst, and in E-commerce from Foreign Trade University of Hanoi, advised by Prof.Hung Nguyen.

 

Introduction

Research Interest.

Currently, my focus is on 3D Vision applications, specifically researching the integration of vision and language to generate CAD models with industrial accuracy. My goal this year is to develop a versatile framework capable of producing CAD models for furniture components from image and/or language inputs, accommodating out-of-distribution requests like unconventional chair designs.

Perception

Perception

If humans can see stars like nocturnal animals, hear high frequency sounds, and feel magnetic fields using external sensors, how will the brain handle these new types of signals? How will it affect our consciousness and subconsciousness? Is it possible and how to unlock new perceptions for humans?

Robot Evolution

Robot Evolution

Biological species evolve based on primal goals. How can robots be made to evolve into a high-order organized society similar to that of humans, and what would it look like? How would they cooperate, handle conflicts, yield, compromise, and come up with new, non-predetermined goals (such as performing art or exploring space)? When robots can do self-adjustment based exploration (questioning, reasoning, and creating), does it become a truly intelligent subject?

 

My work

Research Experience and Selected Projects.

Following projects showcases my skills and experience through real-world examples of my work. Each project is briefly described with links to code repositories and reports. It reflects my ability to solve complex problems and manage projects effectively.

Hard example mining for multi-view part segmentation project thumbnail

Hard example mining for multi-view part segmentation

Part segmentation can reduce the ambiguity of meshes used in further downstream tasks for AR/VR at Meta. Multi-view(MV) part segmentation faces challenges due to complexity and high labeling costs/time (can...

CubeSat Localization project thumbnail

CubeSat Localization

Classify Region and Detect Landmark for Localization...

MLMF: Multi-modal Meta-Learning for Federated Tasks project thumbnail

MLMF: Multi-modal Meta-Learning for Federated Tasks

Improve learning from missing modalites in federated settings...

Pose Estimation for Stereo Visual Odometry - Updating project thumbnail

Pose Estimation for Stereo Visual Odometry - Updating

Optical Flow and Depth for Pose Estimation on TartanAir...

Structured Light for Fruit Freshness Prediction project thumbnail

Structured Light for Fruit Freshness Prediction

Is structured light better?...

Fine-tuning 2D Segmentation for 3D Vision - Updating project thumbnail

Fine-tuning 2D Segmentation for 3D Vision - Updating

Fine-tune SAM model for 3D inputs....

Meta Learning for Few-Shot Medical Text Classification project thumbnail

Meta Learning for Few-Shot Medical Text Classification

Meta Learning with Distributionally Robust Optimization for Medical Text...

Camera Next Door - Updating project thumbnail

Camera Next Door - Updating

A home-made pipeline of incremental learning for object detection...

YOLOv4+SORT project thumbnail

YOLOv4+SORT

Add object tracker to YOLOv4 Darknet...

PrismRanger Localization - Updating project thumbnail

PrismRanger Localization - Updating

Localization on the Moon using ground image...

 

My Accomplishments

AWARDS & EXPERIENCES.

 

What course I took

Related Courseworks.

16-820

Advanced Computer Vision.

11-777

Multimodal Machine Learning.

16-811

Mathematical Fundamentals for Robotics.

16-861

Space Robotics.

16-823

Physics-based Methods in Vision.

16-825

Learning for 3D Vision.

15-858

Discrete Differential Geometry.

CS330

Deep Multi-Task and Meta Learning.

IS467

Fundamentals of Data Science.

CSC424

Advanced Data Analysis.

CSC478

Machine Learning.

CSC495

Social Networks Analysis .

CSC481

Image Processing.

CSC528

Computer Vision.

CSC555

Mining Big Data.

CSC529

Advanced Data Mining.

CSC578

Neural Networks and Deep Learning.

CSC594

Natural Language Processing.

CSC587

Cognitive Science.