Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining device type #354

Open
sa-eneidhart opened this issue Oct 31, 2018 · 2 comments
Open

Determining device type #354

sa-eneidhart opened this issue Oct 31, 2018 · 2 comments

Comments

@sa-eneidhart
Copy link

I'd like to use data from uap to categorize devices into mobile and desktop. I have seen #31 and am aware that this is subjective and prone to inaccuracies, but that's OK-- I just want to use the information already available to make the most accurate distinction that I can. I'm just having difficulty finding out what that information is.

For example, usually I get "Other" for device family on desktops. I also know that I may have to make some arbitrary decisions, like categorizing tablets under mobile. What data will I need to make these categorizations?

@allan-simon
Copy link

allan-simon commented Oct 20, 2020

Hello I'm having the same need, and I wonder if we can not just start a spin-off project , that takes the devices extracted by UAP core, and we categorize them ?
so that we can mutualize effort, sit on the shoulder of giants by not reinveting the parsing , and by "just" having a huge map

'device_model1' => 'desktop'
'device_model2 => 'tablet'
'device_model3' => 'other'

(so that for your need, if you want to categorize tablet as desktop you can still do it )

@sa-eneidhart
Copy link
Author

I think that's a great idea, though I'm afraid my need for such a solution has since passed. The goal for me was to separate mobile data from desktop data, as well as try to separate bot traffic from the normal kind. It was known that these groupings are somewhat arbitrary, and no matter what we'd likely have data that either didn't fit in neatly or was too incomplete to make any claims with any sort of accuracy. I was hoping to receive some help in the form of heuristics which would make such categorizations pretty easy for the majority of cases, but as you can see 2 years have passed and no such thing happened, so I had to come up with my own.

If you're interested, this became my method for categorization. I hope it helps you out!!

  1. Extract the following bits of data: device family, OS family, and browser family, as well as the ua string.
  2. Separate bots from humans (arguably the most complex part). If any of the following were true, we decided it was probably a bot.
    i. the device family is "Spider"
    ii. the text "headless", "bot", "crawler", or "spider" appears in the browser family
    iii. The browser family is "PhantomJS"
    iv. The ua string contains "SlimerJS" or "Google-Structured-Data-testing-Tool"
    v. This seemed to cover most of the obvious web crawlers and headless browsers in our data set, I'm sure plenty of others exist though.
  3. Use the OS family to separate desktop from mobile by checking against a library of known OS names, which I'll list below.
  4. Failing that, do the same for browser family.
  5. Failing that, do the same by device family (this could get really extensive, and we rarely ever got to this point, so we just checked if the device family was "iPad". If you need to separate tablets into their own category, this step would be a nightmare).
  6. If we still don't know anything, log that an error occurred and log all the data we have so the heuristics can be adjusted later. Call it a desktop for the time being.

Here's the libraries of known OS and browser names, I'm sure there are a ton more out there but this is what was in our data set at the time. Also, I'd take everything here with a grain of salt, this was all put together with guesswork and aiming for a "good enough" solution to a problem whose solutions are exclusively arbitrary. Again, I hope this is all useful to you, and I wish you the best of luck!

export const OS_NAMES = {
  DESKTOP: {
    CHROME_OS: 'Chrome OS',
    FEDORA: 'Fedora',
    LINUX: 'Linux',
    MAC_OS_X: 'Mac OS X',
    UBUNTU: 'Ubuntu',
    WINDOWS: 'Windows',
  },
  MOBILE: {
    ANDROID: 'Android',
    BLACKBERRY_OS: 'BlackBerry OS',
    FIREFOX_OS: 'Firefox OS',
    IOS: 'iOS',
    WINDOWS_PHONE: 'Windows Phone',
  },
};

export const BROWSER_NAMES = {
  DESKTOP: {
    CHROMIUM: 'Chromium',
    EDGE: 'Edge',
    FIREFOX: 'Firefox',
    IE: 'IE',
    MAXTHON: 'Maxthon',
    OPERA: 'Opera',
    SAFARI: 'Safari',
    SEAMONKEY: 'SeaMonkey',
    VIVALDI: 'Vivaldi',
    YANDEX_BROWSER: 'Yandex Browser',
  },
  MOBILE: {
    AMAZON_SILK: 'Amazon Silk',
    ANDROID: 'Android',
    BLACKBERRY_WEBKIT: 'BlackBerry WebKit',
    CROSSWALK: 'Crosswalk',
    FACEBOOK: 'Facebook',
    FLIPBOARD: 'Flipboard',
    INSTAGRAM: 'Instagram',
    PINTEREST: 'Pinterest',
    SAMSUNG_INTERNET: 'Samsung Internet',
  },
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants