Skip to content

Commit a299802

Browse files
committed
shuffle content
1 parent 4a8e10d commit a299802

23 files changed

Lines changed: 1059 additions & 0 deletions

File tree

content/docs/Recsys/mmds.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
weight: 10
3+
title: MMDS
4+
---
5+
6+
# Mining Massive Datasets

content/docs/about.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
+++
2+
title = "About Me"
3+
weight = 20
4+
+++
5+
6+
## About Me
7+
8+
{{< figure src="https://raw.githubusercontent.com/veekaybee/veekaybee.github.io/master/static/images/headshot.jpeg" alt="image" width="200px">}}
9+
10+
11+
Hi! I'm Vicki. I'm a [machine learning engineer at Tumblr.](https://applyingml.com/mentors/vicki-boykis/) working on recommender systems with deep expertise in YAML indentation. I live in Philly with my family. In my free time, I love to write about Life. I also like to think about [what technology means](https://vicki.substack.com/) in the context of society, and write about that, as well. My main site is [here.](https://vickiboykis.com)
12+
13+
I also love tweeting terrible puns and doodling tech logos.
14+
15+
{{< tweet 1471282749780770817 >}}
16+
17+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
bookCollapseSection: true
3+
weight: 10
4+
title: Java
5+
---
6+
7+
# Java
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
title: is_set
3+
4+
weight: 2
5+
bookToc: false
6+
---
7+
8+
# Is set versus array_key_exists
9+
10+
I dug into the performance of isset versus array_key_exists in PHP.
11+
12+
Both look at an array and determine if it has a specific key, but [their behavior is different.](https://coderwall.com/p/q9erfw/isset-vs-array_key_exists)
13+
14+
isset will return false if the value of that key is null. array_key_exists will only look at the key itself.
15+
16+
Using an id as an example:
17+
18+
```php
19+
$a = array('1' => '`12345678`', 'key2' => null);
20+
21+
isset($a['key1']); //true
22+
isset($a['key2']); //false
23+
24+
array_key_exists('key1', $a); // true
25+
array_key_exists('key2', $a); //true
26+
```
27+
28+
So if you want your function to be null-safe, isset is always the best best. It's also marginally better for performance because it's a PHP language construct rather than a function, like [array_key_exists].
29+
30+
Looking at a small sample of data, we can see the marginal performance improvements at scale: (note, it's better not to do all three performance comparisons in the same script because, using the same array for inserts means there's a small cache warm-up benefit for array_key_exists if it comes after isset.
31+
32+
```php
33+
<?php
34+
35+
// Small Loops
36+
$small_array = array();
37+
38+
for($i = 0;$i < 100;$i++) {
39+
array_push($small_array,$i);
40+
}
41+
42+
$start_small = hrtime(true);
43+
for($i = 0;$i < 1000;$i++) {
44+
isset($small_array['key1']);
45+
}
46+
$end_small = hrtime(true);
47+
48+
$small_isset_time = ($end_small - $start_small) / 1000 ;
49+
echo "Small isset loop is $small_isset_time ns \n ";
50+
51+
$start_small = hrtime(true);
52+
for($i = 0;$i < 1000;$i++) {
53+
array_key_exists('key1', $small_array);
54+
}
55+
$end_small = hrtime(true);
56+
57+
$small_isset_time = ($end_small - $start_small) / 1000 ;
58+
echo "Small array_key_exists loop is $small_isset_time ns \n";
59+
```
60+
61+
// Medium Loops
62+
63+
```php
64+
$medium_array = array();
65+
for($i = 0;$i < 1000;$i++) {
66+
array_push($medium_array, $i);
67+
}
68+
69+
70+
$start_medium = hrtime(true);
71+
for($i = 0;$i < 1000;$i++) {
72+
isset($medium_array['key1']);
73+
}
74+
$end_medium = hrtime(true);
75+
76+
$medium_isset_time = ($end_medium - $start_medium) / 1000 ;
77+
echo "Medium isset loop is $small_isset_time ns \n ";
78+
79+
$start_medium = hrtime(true);
80+
for($i = 0;$i < 1000;$i++) {
81+
array_key_exists('key1', $medium_array);
82+
}
83+
$end_medium = hrtime(true);
84+
85+
$medium_isset_time = ($end_medium - $start_medium) / 1000 ;
86+
echo "medium array_key_exists loop is $medium_isset_time ns \n";
87+
```
88+
89+
// Large Loops
90+
```php
91+
$large_array = array();
92+
for($i = 0;$i < 100000;$i++) {
93+
array_push($large_array, $i);
94+
}
95+
96+
$start_large = hrtime(true);
97+
for($i = 0;$i < 1000;$i++) {
98+
isset($large_array['key1']);
99+
}
100+
$end_large = hrtime(true);
101+
102+
$large_isset_time = ($end_large - $start_large) / 1000 ;
103+
echo "Large isset loop is $small_isset_time ns \n ";
104+
105+
$start_large = hrtime(true);
106+
for($i = 0;$i < 1000;$i++) {
107+
array_key_exists('key1', $large_array);
108+
}
109+
$end_large = hrtime(true);
110+
111+
$large_isset_time = ($end_large - $start_large) / 1000 ;
112+
echo "Large array_key_exists loop is $large_isset_time ns \n";
113+
```
114+
115+
116+
# Results:
117+
118+
```bash
119+
Small isset loop is 18.179 ns
120+
Small array_key_exists loop is 30.321 n
121+
122+
medium isset loop is 16.552 ns
123+
medium array_key_exists loop is 20.224 ns
124+
125+
large isset loop is 16.669 ns
126+
large array_key_exists loop is 19.662 ns
127+
```
128+
129+
130+
131+
# Usage
132+
133+
Does using one or the other matter? It depends and probably doesn't matter much in my specific use case (where the calls to Lucene/ES are of a larger concern), but, given the null safety guarantees and marginal performance improvement, isset would be better to use in general.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
bookCollapseSection: true
3+
weight: 10
4+
title: PHP
5+
---
6+
7+
# PHP
8+
9+
People are always surprised when I say that I use PHP as a machine learning enginer. I'd say on any given week, about 30-35% of my time is spent using PHP to serve recommendations in the front-end of the app, and if you work end-to-end with recommender systems, or any machine learning system that gets served as part of a web app, chances are you'll work heavily with the backend language serving it, too.
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
title: is_set
3+
4+
weight: 2
5+
bookToc: false
6+
---
7+
8+
# Is set versus array_key_exists
9+
10+
I dug into the performance of isset versus array_key_exists in PHP.
11+
12+
Both look at an array and determine if it has a specific key, but [their behavior is different.](https://coderwall.com/p/q9erfw/isset-vs-array_key_exists)
13+
14+
isset will return false if the value of that key is null. array_key_exists will only look at the key itself.
15+
16+
Using an id as an example:
17+
18+
```php
19+
$a = array('1' => '`12345678`', 'key2' => null);
20+
21+
isset($a['key1']); //true
22+
isset($a['key2']); //false
23+
24+
array_key_exists('key1', $a); // true
25+
array_key_exists('key2', $a); //true
26+
```
27+
28+
So if you want your function to be null-safe, isset is always the best best. It's also marginally better for performance because it's a PHP language construct rather than a function, like [array_key_exists].
29+
30+
Looking at a small sample of data, we can see the marginal performance improvements at scale: (note, it's better not to do all three performance comparisons in the same script because, using the same array for inserts means there's a small cache warm-up benefit for array_key_exists if it comes after isset.
31+
32+
```php
33+
<?php
34+
35+
// Small Loops
36+
$small_array = array();
37+
38+
for($i = 0;$i < 100;$i++) {
39+
array_push($small_array,$i);
40+
}
41+
42+
$start_small = hrtime(true);
43+
for($i = 0;$i < 1000;$i++) {
44+
isset($small_array['key1']);
45+
}
46+
$end_small = hrtime(true);
47+
48+
$small_isset_time = ($end_small - $start_small) / 1000 ;
49+
echo "Small isset loop is $small_isset_time ns \n ";
50+
51+
$start_small = hrtime(true);
52+
for($i = 0;$i < 1000;$i++) {
53+
array_key_exists('key1', $small_array);
54+
}
55+
$end_small = hrtime(true);
56+
57+
$small_isset_time = ($end_small - $start_small) / 1000 ;
58+
echo "Small array_key_exists loop is $small_isset_time ns \n";
59+
```
60+
61+
// Medium Loops
62+
63+
```php
64+
$medium_array = array();
65+
for($i = 0;$i < 1000;$i++) {
66+
array_push($medium_array, $i);
67+
}
68+
69+
70+
$start_medium = hrtime(true);
71+
for($i = 0;$i < 1000;$i++) {
72+
isset($medium_array['key1']);
73+
}
74+
$end_medium = hrtime(true);
75+
76+
$medium_isset_time = ($end_medium - $start_medium) / 1000 ;
77+
echo "Medium isset loop is $small_isset_time ns \n ";
78+
79+
$start_medium = hrtime(true);
80+
for($i = 0;$i < 1000;$i++) {
81+
array_key_exists('key1', $medium_array);
82+
}
83+
$end_medium = hrtime(true);
84+
85+
$medium_isset_time = ($end_medium - $start_medium) / 1000 ;
86+
echo "medium array_key_exists loop is $medium_isset_time ns \n";
87+
```
88+
89+
// Large Loops
90+
```php
91+
$large_array = array();
92+
for($i = 0;$i < 100000;$i++) {
93+
array_push($large_array, $i);
94+
}
95+
96+
$start_large = hrtime(true);
97+
for($i = 0;$i < 1000;$i++) {
98+
isset($large_array['key1']);
99+
}
100+
$end_large = hrtime(true);
101+
102+
$large_isset_time = ($end_large - $start_large) / 1000 ;
103+
echo "Large isset loop is $small_isset_time ns \n ";
104+
105+
$start_large = hrtime(true);
106+
for($i = 0;$i < 1000;$i++) {
107+
array_key_exists('key1', $large_array);
108+
}
109+
$end_large = hrtime(true);
110+
111+
$large_isset_time = ($end_large - $start_large) / 1000 ;
112+
echo "Large array_key_exists loop is $large_isset_time ns \n";
113+
```
114+
115+
116+
# Results:
117+
118+
```bash
119+
Small isset loop is 18.179 ns
120+
Small array_key_exists loop is 30.321 n
121+
122+
medium isset loop is 16.552 ns
123+
medium array_key_exists loop is 20.224 ns
124+
125+
large isset loop is 16.669 ns
126+
large array_key_exists loop is 19.662 ns
127+
```
128+
129+
130+
131+
# Usage
132+
133+
Does using one or the other matter? It depends and probably doesn't matter much in my specific use case (where the calls to Lucene/ES are of a larger concern), but, given the null safety guarantees and marginal performance improvement, isset would be better to use in general.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
weight: 10
3+
bookCollapseSection: true
4+
---
5+
6+
# SQL
7+
8+
People are always surprised when I say that I use PHP as a machine learning enginer. I'd say on any given week, about 30-35% of my time is spent using PHP to serve recommendations in the front-end of the app, and if you work end-to-end with recommender systems, or any machine learning system that gets served as part of a web app, chances are you'll work heavily with the backend language serving it, too.

0 commit comments

Comments
 (0)