The client was a Startup wanted to create a system for parsing & classifying resumes automatically.
Searching & shortlisting candidates for any job is always a pain for HRs as it requires a lot of manual efforts. The client wanted to solve the problem of shortlisting candidates without going through thousands of resumes and matching them with the requirements, one by one manually. As resumes contain lot of un-structured data so identifying the text and then categorizing it appropriately was the biggest challenge. Also, every resume is different in structure, data size, industry, skills set etc so this inconsistency was adding more complexity to the problem. Moreover, it was not possible to acquire large set of sample data for building intelligent engine due to limited availability of resumes.
We analyzed hundreds of resumes before starting the actual planning of the project. We found that the resumes are mostly un-structured and the language & parameters in them are basically very different in different industries. There are in-definite ways to write dates, job titles, employment history, skills and personal details in any resume. So, it was required to understand the context in which words occur in the resumes and the relationship between them. We decided to have a fairly large size of sample resume data to get the desired accuracy. We decided to focus on only one industry (IT/Software) and one language (English) for the first phase of the project.
We started with parsing of resumes and extracting text data with a rule based system. We followed sectional approach where we divided the CV into small sections and then the model handled each sections individually. We used various open source Text Extraction and Natural Language Processing libraries for identifying entities and contexts. We stored all parsed data in database for future reference. We developed a AI driven classifying engine for resumes by training it with sample resumes. We provided a panel where the client can search & shortlist candidates based on extracted text and contexts. We also developed an AI driven recommendation engine which provided a score to each candidate, based on his credentials.
The final product is an intelligent system capable of parsing & classifying resumes automatically with acceptable accuracy. The system ensures that a good candidate profile never gets missed. The candidates can be shortlisted automatically and can be matched with pre-defined requirements. The next phase of the project will use Deep Learning and Machine Learning to improve the accuracy over time.
Python, MongoDB, jQuery, HTML5, CSS, Machine Learning, Artificial Intelligence, Deep Learning, Natural Language Processing