We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Computational Job Market Analysis with Natural Language Processing

Authors: Mike Zhang
Abstract: [Abridged Abstract]
Recent technological advances underscore labor market dynamics, yielding significant consequences for employment prospects and increasing job vacancy data across platforms and languages. Aggregating such data holds potential for valuable insights into labor market demands, new skills emergence, and facilitating job matching for various stakeholders. However, despite prevalent insights in the private sector, transparent language technology systems and data for this domain are lacking. This thesis investigates Natural Language Processing (NLP) technology for extracting relevant information from job descriptions, identifying challenges including scarcity of training data, lack of standardized annotation guidelines, and shortage of effective extraction methods from job ads. We frame the problem, obtaining annotated data, and introducing extraction methodologies. Our contributions include job description datasets, a de-identification dataset, and a novel active learning algorithm for efficient model training. We propose skill extraction using weak supervision, a taxonomy-aware pre-training methodology adapting multilingual language models to the job market domain, and a retrieval-augmented model leveraging multiple skill extraction datasets to enhance overall performance. Finally, we ground extracted information within a designated taxonomy.
Comments: Ph.D. Thesis (315 total pages, 52 figures). The thesis slightly modified with this https URL ISBN (electronic): 978-87-7949-414-5
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2404.18977 [cs.CL]
  (or arXiv:2404.18977v1 [cs.CL] for this version)

Submission history

From: Mike Zhang [view email]
[v1] Mon, 29 Apr 2024 14:52:38 GMT (7361kb,D)

Link back to: arXiv, form interface, contact.