LinkedOut Reach

Inspiration

So you're a high school graduate and have been told that your network is your net-worth. Or maybe you're a little more mature and recently took the plunge and underwent a career transition into tech. In any case there are always people that need to be able to network but don't quite know how - from 18 yr olds right through to late career switchers. But what is networking? How does one network??

Sammy said it best in the HT6 opening ceremony: networking goes a long way. Whether it's being able to find out more about job roles and the industry than the internet lets on, or being able to entirely bypass certain parts of the application stream, when done effectively, networking has the ability to truly enhance careers.

With the rise of LinkedIn as the go to social platform for networking, we wanted to find a way to make using this platform that little bit more accessible for all the different industry newcomers. We wanted to provide these people with a little support to help them get started with adjusting to networking in the corporate tech industry. LinkedOutReach is the set of training wheels newcomers need before taking on the free road.

What it does

LinkedOutReach leverages state-of-the-art NLP and machine learning techniques to scrape a user's LinkedIn profile data and cross-reference it against LinkedIn's very own dataset of all its public users via its own API (sourced from an open source github project [1]), to match the user with people with built existing platforms that share similarities and uses these commonalities to craft a personalised example message that the user can send to get to grip with connecting with other people

How we built it

Front End:

LinkedIn URL taken in on home webpage
Gets sent to back end, and is used for web scraping here, before being sent back to the front end
Data is stored into MongoDB
Machine learning model does its thing
Upon producing final LinkedIn message, it's sent back to the front end for display
Used React (with Vite), TypeScript, and Mantine UI for building a responsive and user-friendly interface.

BackEnd:

Selenium was used to scrape LinkedIn profile data
SpaCy and Stanza were utilized for extracting and analysing keywords from profile data (Hugging Face Transformers pipeline was employed for summarising profile information)
Similarity calc was done manually with scikit-learn's TfidfVectorizer and cosine_similarity
Cohere was integrated to generate personalized LinkedIn connection requests
MongoDB was used to store and manage extracted profile data

Challenges we ran into

With any kind of web-scraping project from the internet, you face ethical concerns with the legalities of what you're scraping and what you're using it for. By using LinkedIn's own API, we automatically adhere to their ethical guidelines

And, as is hackathon standard, we faced various technical difficulties. As we were using an automated test version of chrome (using webdrivers) to connect to LinkedIn, we would often have to authenticate ourselves. These repeated request for verification became an issue later on when there would be conseuences such as forced timeouts, overwhelmed caches that forced a restart of our devices etc. We also faced a bit of difficulty combatting the steep learning curve since none of us had ever used platforms such as MongoDB before, or web scraping etc. This also made integration of the front end and back end system particularly difficult.

We also struggled minorly with ideation since the project flow would ideally have some authentication system as a further means of security (outside of the hardcoded log in which we had been using, which we intended to have. We also needed this because, due to the limited visibility of public profile you get of LinkedIn profiles when not logged in, web-scraping without being logged in would result in very sparse data. But having user authentication proved a difficult task so we had to spend some time thinking of a heuristic solution that could get around this.

Accomplishments that we're proud of

Our work flow and team split worked perfectly, we were able to work incredibly well as a team because of it. Each person was able to take on a large aspect of the project and use their skill set to contribute an equal portion to the project. We had one member take care most of the UI (front end), another on UX (front end), another on NLP and machine learning (backend), and one dedicated member to the integration between front and backend. This worked really really well.

We're also incredibly proud of completing the project and that is a project made with helping people in and improving lives kept in mind.

What we learned

We learnt how to use all these technical tools, such as web scraping, using NLP toolkits, using UI frameworks, using a noSQL database etc.

We also learnt how many lululemons are in Toronto (71).

What's next for LinkedOut Reach

With LinkedIn's large following, it would be interesting to see if LinkedIn Premium offers its own API (since a range of different pieces of information become available that were previously not) and whether this can be used to enhance.

There are also several features that we ideally could have implemented that would make for trivial extension projects - adding an authentication system for the user using Auth0, including the option to, rather than scrape information from your linkedIn, to use your resume instead. This last one is a rather crucial one since the aim of the project is to help people who have likely never had to use LinkedIn very much so it much more likely they opt for pdf resumes to hand in for job applications instead of LinkedIns.

References

[1] Priyono, C. (2023) A comprehensive guide to scraping data from LinkedIn with python, Medium. Available at: https://medium.com/@chatur.agus.priyono/a-comprehensive-guide-to-scraping-data-from-linkedin-with-python-e128c46c5c74 (Accessed: 04 August 2024).

Built With

beautiful-soup
cohere
cohere==
flask
mongodb
pymongo
python
react
selenium
typescript
web-scraping
webdrivermanager

Submitted to

Hack the 6ix 2024

Created by

I worked on the backend, implementing a custom hybrid NLP toolkit (of SpaCy, stanza, and state-of-the-art Hugging Face) and Cohere's LLM architecture to extract keywords from the web scraped data and used vectorisation and cosine similarity to generate a personalised LinkedIn message using extracted keywords.

Kaalkidan Sahele
Hi! I'm Kaal :) Hackathons are super fun
Catherine Yang
neha-bhatla Bhatla
Shivya Mehta