Back to Data-hk

Hong Kong IT Job Advertisement Data Science Report (香港IT工招聘廣告數據科學報告)

This Project is conducted by Mr. Cyrus Wong, Data Scientist of Cloud Innovation Centre, IVE (Lee Wai Lee).

This report aims to help IT practitioners, students and teachers to understand the manpower needs of IT industry in Hong Kong. Students always want to know the essential technical knowledge and skills that they need in finding a job. e.g. PHP or ASP.NET? Android or iOS?

The methodologies of Data Science have been adopted in this project to investigate the keywords and extract the hidden information from 192,000 IT job advertisements. The results of this project therefore help you make better decisions on your further study and career development.


Please use Google Chrome in Window or Safari in Mac (Enable WebGL) to read this report! And, you can learn how to read the result and get the explanation from Youtube Clip.

The Proof of concept for the Technologies Behind the Report:
Fast Automatic Distributed Parallel Order iPhone 5 - Cloudera Hadoop Cluster


  1. IT Job Skill Index (IT工作技能指數) - Grouped Version OR Full List Version

    A Keyword Count summary and Simple correlation analysis with Higher Diploma. More Detailed Analysis for a Keyword:
    Trendency (趨勢分析) Prediction Tree (預測樹)
    Social Network Terms Analysis (社交網絡用語分析) Geochart (區域圖)
  2. Keyword Density Visualization (關鍵字密度視覺化)
  3. Interactive IT Term Matrix Social Network (互動IT關鍵字矩陣社交網絡圖)
    Analysis the keyword relations with Social Network Analysis Technique. Communities (Groups) are extracted.

    Analytic Analogy(分析技術類推) -
    1. Each keyword is a "person", which is represented by node, and node size reflects the occurrence frequency.
    2. Job Advertisement is an "event",
    3. "Keywords appear in a Job Advertisement" implies "People join a unique event".
    4. People always join event together, which implies they may be "friend" or have "relation", which is reflected by edge.
    Develop with Gephi, and add-on from Oxford Internet Institute - Sigmajs Exporter.
  4. Keyword Correlation Analysis (IT工作關鍵字相關性分析) (Big/WebGL) (Grouped Version)
    Relationship between each pair of Keyword, and interactive 3D Plot.
  5. IT Job Skill Cube (IT工作技術立方) - Principal Components Analysis (主成分分析) (Big/WebGL) (Grouped Version)
    Transform variable with principal components, and interactive 3D Plot to explore their relation.
  6. Job Advertisement Cluster Analysis (IT工作集群分析)
    Cluster Analysis over Keywords
  7. Codebook (關鍵字碼簿)
    It defines the group, keywords, and a list of Synonyms, which are related to IT industry.

    You can submit new keywords that you are interested in, and it will be included in the next report! Online form!.
  8. Hong Kong IT Job Advertisement Data Mining Framework (香港IT工招聘廣告數據挖掘報告框架) Explanation Slide Show (說明幻燈片)

    Fast Automatic Distributed Parallel Order iPhone 5 (Cloudera Hadoop Cluster)

    The demo is the Big data and Cloud technology behind this report. Do you wish to get a copy of this program? Please share and "Like" this page! , it will become opensource project!


  1. WebGL - Require Good WebGL Support Browser i.e. Window - Chrome, or Mac OS - Safari (Enable WebGL)!
  2. Big - Please be patient for the loading time!

Created by Mr. Cyrus Wong, Data Scientist.
For update, please join the IVE - Information Technology Facebook page.
For technical explanation, please join the Report Facebook page.
For discussion, please join my Facebook group or linkedin me.

This Report is Sponsored by:

Amazon Web Services   Cloudera   Lively Impact JobsDB.comIVE - Information Technology

supported by AWS in Research Grant Awards