Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension: Add Linkedin profile to Twenty #2413

Closed
Bonapara opened this issue Nov 9, 2023 · 20 comments
Closed

Extension: Add Linkedin profile to Twenty #2413

Bonapara opened this issue Nov 9, 2023 · 20 comments

Comments

@Bonapara
Copy link
Member

Bonapara commented Nov 9, 2023

Scope & goals

Enable users to add People & Companies records to Twenty from LinkedIn's People or Company pages. This requires a Chrome Extension, which adds a button on LinkedIn pages and scrapes page content when pressed.

Desired behavior

  1. I click on the "Add to Twenty" button on LinkedIn.
  2. We scrape the data from the LinkedIn page.
  3. We call the Twenty API to create a new company / people record.

The extension logo should be the "Twenty" logo.

What we want to import

People

  • Name
  • Profile picture
  • Job title [from most recent experience]
  • City
  • Linkedin URL

Companies

[parse most recent experience]

  • Name
  • Domain name
  • Linkedin URL
  • Address
  • Number of employees (Use the interval's lower value.)

Tech specs

Create a basic chrome extension

Guide to create Chrome extensions:
https://developer.chrome.com/docs/extensions/mv3/getstarted/

For now you can create the extension in packages/twenty-chrome-extension

You can use a generator for React/Typescript like this one: https://github.com/guocaoyi/create-chrome-ext/tree/main/template-react-ts (it uses Vite while we use Webpack in other parts of this repo but I think it's okay to change since we might move to Vite. Also please use yarn and not npm).

Make sure to add a README that gives readers basic instructions on how to test the extension locally

Insert a button in DOM

image

Scrape data from Linkedin

Should we use a backend-api approach? E.g.
https://github.com/atul-gairola/LinkedIn-crm-extension/blob/6160e6cd4853225c5adf5b5c7ef2fa7a6a805696/src/pages/Background/index.js#L73

Or just scrape the frontend?

I'm not 100% sure what's the best approach. The backend approach feels more robust/stable, but if over-used it's more likely to get flagged and eventually get the account banned. Also if we go with the frontend approach, I'm not sure React will be enough to do this well and we probably need to introduce our good old friend jQuery in parallel.

Push data to Twenty

Create a company and contact via GraphQL API Call
See: https://docs.twenty.com/graphql/

I'm not sure the API support passing the company via createOnePerson to it might be safer to do a first API call createOneCompany and then create the person.

Create a basic settings page

Screen.Recording.2023-11-10.at.12.32.25.mov

https://www.figma.com/file/xt8O9mFeLl46C5InWwoMrN/Twenty?type=design&node-id=15167-54916&mode=design&t=y0dkApZGbbNBw27k-11
Store info in the extension's local storage

Automatically open settings

Automatically open the settings page upon install and if the user clicks on the button but no API is in storage

For future iterations

Just as for context and to help you structure things, in the future we will likely:

  • do a query to Twenty to see if the record already exist upon loading the page and update the "add button accordingly"
  • add a button on other pages (e.g. list of people or companies detailed page)
  • add a modal during onboarding to encourage user to install the integration
  • simplify the process to move from user-provided API Key to a simpler OAuth flow
  • add other features to the extension (e.g. integrate other network like Twitter ; or integrate features for other tools such as Google Meet, Google Calendar...)
  • select a specific list contacts can be added to
  • add more standard fields on people/companies (e.g. company industry) that will be filled directly from the extension
@shavidze
Copy link
Contributor

Is someone working on it?

@mabdullahabaid
Copy link
Contributor

I had a conversation with Felix about this issue on Discord. Please assign it to me so I can start working on this.

@Bonapara
Copy link
Member Author

Hi @mabdullahabaid just assigned you. Maybe @shavidze can help you?

@shavidze
Copy link
Contributor

@Bonapara anytime

@Bonapara
Copy link
Member Author

Maybe the two of you can sync on Discord then 🤷‍♂️🤗

@mabdullahabaid
Copy link
Contributor

Hi @Bonapara, thank you for assigning. Happy to work together, ofc.

Please drop me a message on discord (same username) @shavidze or share your discord handle here for me to connect with you.

@mabdullahabaid
Copy link
Contributor

@FelixMalfait Here's the progress on Chrome Extension development. Had to upload the video to Google Drive due to the size of the file. Please turn on the audio for commentary.

https://drive.google.com/file/d/1yWjdYmtDwP_vDXXzP9GBadANiE9tCIDk/view

A few things to note:

  • Among the two approaches mentioned in the issue, I am using the data scraping approach, not the backend api one because every extension needs to undergo a manual review when publishing to Chrome Web Store. Since the latter approach uses CSRF attacks, I think the extension will get flagged in the review and will never make it to the Web Store. Will love to hear thoughts on this.
  • The implemented logic extracts data for non-logged in accounts. We can use it on the logged in account too, but the DOM is a little different, thus returning incorrect data for the most recent job title - this is definitely a problem with scrapping data from the frontend.
  • I have not created a draft pull request yet, but you can review the code setup here if you want - have not made any changes except for the twenty-chrome-extension folder in the packages directory.

Will work on connecting to the backend and storing data to the database if the current implementation looks good for now.

@FelixMalfait
Copy link
Member

Really great work @mabdullahabaid!

A few suggestions:

  • Add a basic README ; eventually it can redirect to a page you create in the /docs? Maybe a dedicated page in contributors/frontend/ ?
  • I wouldn't be too worried about Chrome not accepting the extension, because you have to go pretty deep into the code to understand what an extension is doing. Unless we explicitly use words like "csrf attack" in the plugin's description I wouldn't be too worried
  • I feel like frontend is the right short-term solution (less intrusive / no ability to be detected so no need to play hide-and-seek with LinkedIn) ; and backend is the right long-term solutions if we decided to integrate more features (way more powerful so we can do things like import many contacts at once, import conversations, etc.)
  • I'm not clear enough on the long-term roadmap for this plugin to know what the future will be like. E.g. we could prioritize adding this to other social networks, Gmail, record Google Meet meetings, Google calendar, etc. VS going deep on LinkedIn only). I'd say we can go with Frontend as you did for now and if eventually we have to refactor and do backend later on then we'll refactor, that's just the life of software 😅
  • I didn't get the sentence on logged-in vs non logged-in, you meant the opposite no? Because on your video you are logged in and it works. I'd say that we don't need to support non logged-in LinkedIn user but in that case we just need it to fail gracefully (probably just hide the button/feature entirely)
  • I feel like we should modify the backend column "employees" to convert it to an employee range / I might create a ticket for this
  • Not sure how we'll send the picture, I think we'll need to convert it to base64 (the URL might expire, require an ongoing session, etc.)
  • We should separate first name from last name
  • We should probably update the address field, I'll talk with @Bonapara about potential design for this. Otherwise we start adding messy data
  • It's a shame we have to copy/paste UI components, we should probably publish the /ui module as a standalone npm package asap. But I'm not sure what's the best strategy to have multiple workspaces/packages within one, it might take a couple weeks to figure out, so we can keep a direct dependency or copy paste for now (just make sure not to directly edit the files since they will be replaced by the /ui files)
  • We can add a very small CI with a simple test. Even if we don't cover much, just making sure it builds well is useful!

Thanks a lot!

@Bonapara
Copy link
Member Author

Bonapara commented Nov 22, 2023

The address field is ready to be implemented too. Can we collect a structured address format from Linkedin?

CleanShot 2023-11-22 at 18 59 49

@mabdullahabaid
Copy link
Contributor

Hi @FelixMalfait , sorry for the delay - busy week. Back to spending the weekend on this now.

  • Will add a README over the weekend.
  • That’s a good point about the Web Store.
  • Yes, I feel like we’d have to move away from DOM scrapping - it works for now, but a more sophisticated approach will be needed soon.
  • I agree. My idea is to build the end-to-end flow and then fix things - from updating the data we scrape to completely replacing the method we use to extract that data via the extension.
  • Let me try and explain the logged-in vs non-logged-in again: I use a dummy account named “Abdullah A.” to extract data from the DOM. This way, I can extract data for another account (e.g. my main LinkedIn "M. Abdullah Abaid"), but cannot extract data for “Abdullah A.” - the problem occurs due to DOM scrapping, should be fixed when we integrate a more robust solution.
  • Sounds great. Will be more intuitive definitely.
  • I am scrapping images for another project at work and this is how we do it atm: we send URLs from the extension to the Backend, then access those URLs instantly by sending a request from the Backend. This gives me a blob, which I stream into S3 and create a file with the relevant extension (dynamically). Takes less than a second altogether and I get a cached copy of the image in S3.
  • Followed your conversation on Discord, sounds like a great plan, but I believe we need a better way to capture LinkedIn data before we can do this. Will push this requirement to the second iteration - will prioritize posting data to the Backend this weekend and then look for better solutions to extract data itself.
  • I had to strip down component code because copying them led to dependencies, which had further dependencies. Will try to reuse code wherever possible, otherwise will keep it similar to what you guys have - can also come back and integrate the UI library myself later. Also, would love to work on the UI library when you decide to build it - could be a great learning experience and I expect my time to free up just enough to contribute regularly in a couple weeks from now.
  • Yes, surely. Will create a GitHub action to demonstrate successful builds.

@mabdullahabaid
Copy link
Contributor

Hi @Bonapara, followed your conversation on Discord and sounds like a great idea. Will need to switch the way in which we extract data using the extension, therefore I intend to push this requirement to the next iteration as mentioned in the reply to Felix above - will work on posting data to the Backend first and then find a more robust solution to data extraction.

@FelixMalfait
Copy link
Member

I was discussing with a friend that developed scrapers and he told me using the backend side was very risky and gets you banned very easily, so you probably picked the right path @mabdullahabaid - well done, I would have made the other choice 🙃

@mabdullahabaid
Copy link
Contributor

Oh, interesting. I tried avoiding the risk at first for the very reason, but built the extension using both the approaches in separate branches after receiving feedback - data scraping approach here and backend api approach here.

Each took a couple weeks to build, but data turned out more consistent in the backend api approach, so I started stabilizing that version over the past couple weeks. Will switch back to data scraping branch, rebase my changes and get it working with NX before creating a PR. Could not integrate all of the suggestions above due to various challenges, but you should be able to try a working extension on your machine v soon. Apologies for such a long delay.

@FelixMalfait
Copy link
Member

Oh no! Sorry I didn't think it'd have such an impact when I said this. Honestly this is just one data point I got from someone so I'm not sure if we should base everything on that. Do you want to do a quick call with @charlesBochet next week to discuss it together? We can also discuss it async on Discord. But it's too bad if some of your work is lost because of a bad judgement on our side — let's be sure this time 😅

@mabdullahabaid
Copy link
Contributor

Learned a lot in the process, so some lost work should not be a problem 😅

However, building further would require locking up on one approach, and I think a call would provide more clarity to all of us at this point. That said, I do not want to mess up your calendars to plan this call, so if there are no empty slots in the coming week, we can definitely have an async conversation too, no worries.

@mabdullahabaid
Copy link
Contributor

I have updated both the branches to work with NX. Will record a video to showcase the functionality (and issues) today/tomorrow, and create a PR with whatever method we decide in the call/discord this week.

@mabdullahabaid
Copy link
Contributor

mabdullahabaid commented Jan 9, 2024

The first video demonstrates setting up the extension on your local machine and walks through the implemented functionality.
https://drive.google.com/file/d/1k3Jwlcqrrv69eyBiOd1imfeNuCMO6ojS/view?usp=sharing

The second video demonstrates a few bugs, a few implementation details behind-the-scenes, and a few things we can improve on generally.
https://drive.google.com/file/d/1CyPhTpZoemSL2sIm0PhLPp-aNjDX03EX/view?usp=sharing

Excited to finally share a working version with you guys! Please turn on the volume while viewing the videos.

@FelixMalfait @Bonapara @charlesBochet

@Bonapara
Copy link
Member Author

Thanks @mabdullahabaid for this amazing contribution! Can't wait to add leads from LinkedIn to Twenty 🚀

@mabdullahabaid
Copy link
Contributor

Created a PR to close this issue. Tried including comments throughout the code to explain the implementation of certain things. Was a v fun issue for my first ever open source contribution and really appreciate the trust/support from all of you.

@charlesBochet
Copy link
Member

We will take a look early this week @mabdullahabaid. Thanks a lot again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

No branches or pull requests

5 participants