Haojun Gao
Quant researcher, machine learning engineer, and cloud solution architect. Background in Business Analytics, Computer Science and Finance. Passionate about data-driven business and business intelligence.
Currently
- Standing on the shoulders of giants.
Specialized in
Data Science, Cloud Architecture, Quantitative Trading, and Finance
Research interests
Natural Language Processing, Information Retrieval, and Information System.
EDUCATION
July 2020 - now
National University of Singapore (NUS), Singapore
- Master Science in Business Analytics (Jointly offered by School of Computing and Business School)
- Coursework: Data Warehousing, Advanced Analytics and Machine Learning, Big-Data Analytics Technology
- Member, NUS Business Analytics Club
Sep 2016 - Jun 2020
University of Electronic Science and Technology of China (UESTC), China
- Dual Bachelor’s Degree in Computer Science and Finance
- GPA: 3.88/4.0 with Outstanding Thesis Award
CERTIFICATIONS
Dec. 2021
AWS Certified Machine Learning - Specialty
Dec. 2020
AWS Certified Solutions Architect – Associate Certification
Achievements
2017 & 2018 & 2019
First-class Scholarship for Excellent Students
Dec. 2017
Suzhou Industrial Park Scholarship
Nov. 2017
Top 10 School Athlete, UESTC
SKILLS
Programming
Python, JAVA, MATLAB, C/C++, HTML, CSS and JS
Software
Spark, Hadoop, SPSS, WEKA, LaTeX, Office, Amazon Apache MXNet, S3, Aurora, and other AWS services
Algorithms
Traditional machine leaning algorithms such as SVM and Naïve Bayes; ensemble methods such as XGboost and gcForest; deep learning algorithms including CNN, RNN, ResNet, VGG19, Attention Mechanism, Sequence to Sequence Model; and TensorFlow, PyTorch, and clustering algorithms and dimensionality reduction techniques.
LEADERSHIP
Apr 2020 - Apr 2022
AWS Community Builder, AWS
Jul 2018 - Jul 2019
Head, Innovation and Entrepreneurship Workshop of the University, UESTC
Jul 2017 - Jul 2018
Team Leader, Data Science Group, Innovation and Entrepreneurship Workshop of the University, UESTC
PROFESSIONAL EXPERIENCE
Nov 2021 - Feb 2022
Quant Researcher, Shanghai Qianxiang Asset Management, Shanghai, China
Concurrent Duty:
1) Data Streaming ETL Pipeline
- Participated in the systematic CTA that relying on automated trading strategies and models
2) Deep Learning Engineering
- Used unstructured data, such as news and user genarated content in social media, to create new indicator, which could be integrated into the current strategy and pattern-based model.
Oct 2020 - Oct 2021
Cloud Architect and Machine Learning Enginner, Poodle Finance Technology Co, Ltd, Singapore
Concurrent projects:
1) Machine Learning & Finance Modeling
- Start-up based in Singapore. Responsible for U.S. stock related data and model ETL. Use cleaned financial report data, and unstructured text data to automatically generate a general analysis of the fundamentals of listed companies, including the company’s growth, valuation, management capabilities, and financial health.
- Use web crawler technique to collect public UGC content in social media. Use the DistilBERT model to analyze the emotional distribution of listed companies.
2) Cloud Architecture & Backend Development
- Designed a high availability cloud architecture for the company to migrate the business to the Cloud. Which considered cost control via serverless architecture, access management through IAM role, and disaster recovery through DynamoDB replication cross Availability Zones. Greatly reduce the total expenditure by 40%.
- Responsible for agile development, having good understanding of Jira workflow, team environment management, and automation through CloudFormation and python script.
Apr 2021 – Apr 2022
Amazon Web Services (AWS) China (2X Certified), Singapore
1) AWS Community Builder, Singapore
- AWS Certified Solutions Architect - Associate & AWS Certified Machine Learning – Specialty with hands-on expeirence in the DevOps, CI/CD, Agile Development and Machine Learning Integration.
- Publishing more than 20 articles in the community about serverless architecture and machine learning on AWS; work closely with AWS experts and heroes, gain early access to new launched features and beta test.
2) Solutions Architect Intern (AWS program for college students), Sichuan, China
- Participated in a server-less frontend application project using artificial intelligence solutions hosted on AWS such that client company can submit data to the model and produce live predictions with a user-friendly interface
- Developed a website log analysis system, including a dashboard design to monitor user-perceived latency, reporting website availability, and trigger alarms setting up by analysing the logs on a daily basis
Sep 2019 - Aug 2020
Machine Learning Engineer, Shanghai Scishang Data Technology Co., Ltd, Shanghai, China
- Responsible for shopping mall traffic modelling with application of machine learning algorithms in Python and SQL to analyse passenger flows of more than 1,000 large shopping malls in China
- Established a customer flow anomaly warning mechanism which contributes to guide company management with respect to troubleshooting, personnel transfer, and security deployment
- Worked closely with company founder on proposing a pioneer mall traffic interpretation system, the proposed model has nowadays become the industry standard since our proposal
Jun 2017 - Jul 2019
NLP Engineer and Research Assistant, Electronic Commerce Lab, School Of Economics and Management, UESTC
- Hate Speech Detection System: Integrated the deep convolutional network with a block attention module to build the detection model. Overcame the challenges of the variety of platforms and the need to move beyond keywordbased methods which had been shown to miss many instances of hateful speech.
- Automatic Taxonomy Generation: Proposed an unsupervised framework to generate topic taxonomy from massive unstructured text data using non-negative matrix factorization and frequent pattern mining.
RESEARCH WORK AND PUBLICATION
2020
Project Member and Paper Co-author, “Surviving Covid-19: Recovery Curves of Mall Traffic in China”
- Revealed the impact of the Covid-19 on China’s real economy and its recovery curve
- Research results and conclusions were adopted by the Chinese Ministry of Commerce and cited as reference data for China’s economic recovery
2020
Lead Author, “The Causal Impact of Mall Entry of Incumbments: the Role of Substitution, Agglomeration and Life Cycle”
- Revealed the driving mechanism of customer transactions in shopping malls, including study of agglomeration effect, substitution effect, and their alternation in business districts’ life cycle
- Paper and findings were accepted by and shared at INFORMS Marketing Science Conference 2020 (Online due to Covid-19)
NOTEWORTHY PROJECT
2019
Document Search Robot Development Using Amazon Lex and Amazon Elasticsearch Services
- Attended a two-week training from AWSEducation, learned about AI services, image recognition technology, and architecture technology with 20 experiments
- Built a document search robot using Amazon Lex and Amazon Elasticsearch services
- Completed upload service, used cloudformation stack to establish AmazonS3 bucket and two colonies of Elasticsearch, used the AWSLambda function and related IAM characters to visit bucket
- Built search robot by importing default template in Lex, integrated Web UI by using identity pools of Cognito to receive temporary credentials, and used cloudFront for online deployment
2018
Multi-label Text Classification Research
- Applied deep learning algorithm and machine learning algorithm to conduct multi-label classification of senior high school questions by subject and successfully meet the benchmark.
- Used TFIDF and Word2Vec to preprocess the raw data, applied CNN, Char-CNN, RF and SVM to process multi-label text classification; found efficiency of this method lacking
- Improved classification precision while maintaining the same level of recall with hierarchical attention networks, which excavated more hierarchical features and sequential information.
- Developed algorithm by using ensemble methods to refine representation learning and found a creative solution to the model’s heavy workload with gcForest achieving the state-of-the-art F-score.
2018
Hate Speech Detection with Deep Learning Algorithm
- This research is inspired by Analysis of Hate Speech in Social Media, conducted by UCSB NLP Group
- Designed a searching pipeline module with python to collect data stream on social media and store on local server, used XXX to build Chinese and English dataset
- Studied domestic and foreign literature, integrated the very deep convolutional neural network with a block attention module to build and train the detection model and overcome the challenges of the variety of platforms that incubate hate speech other than Twitter and the need to move beyond keyword-based methods that have been shown to miss many instances of hateful speech
- Tested the approach and f1-score in English corpus increased by 7 percentage points and a rise of 3 percentage point in Chinese corpus
2017-2018
Constructing Topical Concept Taxonomy with Adaptive Local Embedding and Clustering
- Applied non-negative matrix factorization and hierarchical clustering to construct a topical taxonomy in a recursive fashion
- Built a hierarchical co-clustering module by using non-negative matrix tri-factorization for allocating attractions and things of interest to topics when splitting coarse topics into fine-grained ones
- Established a concept extraction module for extracting concepts for every topic which maintains strong discriminative power at different levels of the taxonomy
- Devised new topical taxonomy approach by constructing a smallest spanning tree with a greedy heurist algorithm
2016
Hardware Solution Test Engineer, Chengdu ACTi Technology & Development Co., Ltd
- Rotated from assembly line work to module development, fulfilled customers’ demands by designing electronic circuits and conducting performance tests
- Solved a research dilemma within half a day by proactively organizing a special team to gather people involved in the entire production process
- Developed a sorting machine to solve the problem of mixed screws on the production line to improve work efficiency