Black Friday is just around the corner. Many Canadian shoppers will drive south and join the shopping extravaganza. If you plan to travel cross border on November 29, you want to plan smartly to avoid long delays at the border. To solve this problem, we built a web app that forecasts border crossing wait time in the next 7 days.

Here is the workflow of the project:

CTVNews. AP Photo/Elaine Thompson
  1. Retrieve border crossing wait time from Cascade Gateway API
  2. Build predictive model for future crossing using Python + XGBoost
  3. Develop web app REST API using Flask, HTML, CSS, ajax
  4. Deploy web app on…


During the COVID-19 pandemic, people take their worries, concerns, frustration, and loves to social media to share with the rest of the world. Twitter has become one of official channels where world leaders communicate with their supporters and followers.

To understand what keep them busy, we extract tweets of two world leaders, Donald Trump (the President of the United States) and Justin Trudeau (the Prime Minister of Canada). By applying natural language processing techniques and Latent Dirichlet Allocation (LDA) algorithm, topics of their tweets can be learned. So we can see what is on their mind during the crisis.

credit: Huffington Post
credit: Huffington Post

We…


HR Technology Conference and Expo, the world’s leading and largest conference for HR and IT professionals, just took place in Las Vegas, from Oct 1–4, 2019. An incredible amount of HR technology topics were covered at the conference. Unfortunately, not everyone could be there, including myself. Is it possible to tell what the buzzwords and topics are without being there? The answer is YES! We dug into Twitter for some quick insights.

We scrape tweets with #HRTechConf, and build Latent Dirichlet Allocation (LDA) model for auto detecting and interpreting topics in the tweets. Here is the pipeline of the work:


Explores Toronto neighborhoods Airbnb listing prices and identifies hot/cold areas

In exploratory data analysis (EDA), we often calculate correlation coefficients and present the result in a heatmap. Correlation coefficient measures the statistical relationship between two variables. The correlation value represents how the change in one parameter would impact the other, e.g. quantity of purchase vs price. Correlation analysis is a very important concept in the field of predictive analytics before building the model.

But how do we measure statistical relationship in a spatial dataset with geo locations? The conventional EDA and correlation analysis ignores the location features and treats geo coordinates similar to other regular features. …


Predicting someone’s demographic attributes based on limited amount of information available is always a hot topic. It is common to use people’s name, ethnicity, location, and pictures for training models that can tell gender. Can you actually guess someone’s gender only based on what they share on Twitter? We will explore this using NLP techniques in this article.

At the end we conclude that it is quite a challenge to predict someone’s gender only using a single tweet. …


On March 11, the WHO declared the Novel Coronavirus outbreak a pandemic, a new disease that has spread around the world. Many countries have seen reported cases of the virus.

To help track and understand the daily spread of the virus, I built this Power BI dashboard. It provides an overview of the confirmed and recovered cases of COVID-19 worldwide outbreaks. It contains daily updates from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

Some of the measures to highlight:

  • Number of Confirmed per Million Human: This puts the total number of confirmed Coronavirus…

Year 2020 is not off to a good start. The ongoing Coronavirus outbreak that originated in Wuhan, China has infected thousands of people worldwide and killed hundreds. Numbers are still rising everyday. With all the quarantine controls and vaccine development, hope this global epidemic will be soon under control.

When we are facing such a global challenge, we take our emotions and concerns to social media and share Coronavirus news with others. Since the outbreak, each day there are hundreds of thousands of tweets about Coronavirus. …


Visualizing Twitter social network of HRanalytics

Everyday people use social media such as Twitter to share thoughts and ideas. People with similar interests come together and interact on the online platform by re-sharing or replying posts they like. By studying how people interact on social networks, it will help us understand how information is distributed and identify who are the most prominent figures.

In our last post, we did a topic modeling study using Twitter feeds #HRTechConf and trained a model to learn the topics of all the tweets. In this article, we will analyze Twitter user interactions and visualize it in an interactive graph. …


Employee Likes and Dislikes The Most (Credit: TopResume)

I work in people analytics and have been wondering all the time what make employees feel great or bad about their companies. Is it money? Workload? Opportunities to grow? Or team around them? I know the answer depends on the company, but is there anything in common for companies that employees like or dislike the most?

I went to Glassdoor for help. Glassdoor is one of the world’s largest growing job sites where employees anonymously review current or former employers. I did my studies based on the 6,000 companies that have an office in Vancouver, BC.

Peng Wang

Data & Analytics Professional. Connect me on LinkedIn https://www.linkedin.com/in/peng-wang-cpa/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store