[DAT208x] final lab 1-7, 1-8
1-7 Converting a DataFrame to Numpy Array
Since numpy is such a powerful Python module, this exercise asks you to convert a pandas DataFrame to a numpy array to then utilize a statistics metric available through numpy in the next exercise.
- Select the columns unemployed and low_wage_jobs from recent_grads, then convert them to a numpy array. Save this as recent_grads_np.
- Print the type of recent_grads_np to see that it is a numpy array.
answer
# Convert to numpy array
recent_grads_np = recent_grads[['unemployed','low_wage_jobs']].as_matrix()
# Print the type of recent_grads_np
print(type(recent_grads_np))
In [3]: # Convert to numpy array
recent_grads_np = recent_grads[['unemployed','low_wage_jobs']].as_matrix()
# Print the type of recent_grads_np
print(type(recent_grads_np))
<class 'numpy.ndarray'>
trial errors
In [1]: # Convert to numpy array
recent_grads_np = recent_grads[['unemployed','low_wage_jobs']]
# Print the type of recent_grads_np
print(recent_grads_np)
unemployed low_wage_jobs
0 37 193
1 85 50
2 16 0
3 40 0
4 1672 972
5 400 244
6 308 259
7 33 220
8 4650 3253
9 3895 3170
10 2275 980
11 794 372
12 1019 789
13 78 81
14 23 263
15 589 524
16 699 640
17 2859 3192
18 170 137
19 11 144
20 6884 5144
21 338 485
22 824 696
23 70 70
24 1015 708
25 3270 2899
26 1042 703
27 504 285
28 597 365
29 670 340
.. ... ...
143 314 1231
144 266 591
145 28169 48207
146 3918 9286
147 1920 2042
148 1128 3426
149 5486 11880
150 3355 5248
151 3329 4344
152 917 2125
153 1465 2840
154 496 722
155 419 1650
156 326 724
157 372 1141
158 1617 3304
159 1368 3586
160 510 3163
161 82 31
162 3395 6866
163 1487 5125
164 1360 2868
165 846 1115
166 3040 11068
167 1340 3466
168 304 743
169 148 82
170 368 622
171 214 308
172 87 192
[173 rows x 2 columns]
<script.py> output:
unemployed low_wage_jobs
0 37 193
1 85 50
2 16 0
3 40 0
4 1672 972
5 400 244
6 308 259
7 33 220
8 4650 3253
9 3895 3170
10 2275 980
11 794 372
12 1019 789
13 78 81
14 23 263
15 589 524
16 699 640
17 2859 3192
18 170 137
19 11 144
20 6884 5144
21 338 485
22 824 696
23 70 70
24 1015 708
25 3270 2899
26 1042 703
27 504 285
28 597 365
29 670 340
.. ... ...
143 314 1231
144 266 591
145 28169 48207
146 3918 9286
147 1920 2042
148 1128 3426
149 5486 11880
150 3355 5248
151 3329 4344
152 917 2125
153 1465 2840
154 496 722
155 419 1650
156 326 724
157 372 1141
158 1617 3304
159 1368 3586
160 510 3163
161 82 31
162 3395 6866
163 1487 5125
164 1360 2868
165 846 1115
166 3040 11068
167 1340 3466
168 304 743
169 148 82
170 368 622
171 214 308
172 87 192
[173 rows x 2 columns]
In [2]: # Convert to numpy array
recent_grads_np = recent_grads[['unemployed','low_wage_jobs']]
# Print the type of recent_grads_np
print(type(recent_grads_np))
<class 'pandas.core.frame.DataFrame'>
1-8 Correlation Coefficient
You have some suspicion that there's a relationship between the low_wage_jobs and unemployment_rate columns, so you decide to use numpy to calculate the correlation coefficient.
Calculate the correlation matrix of the numpy array recent_grads_np.
# Calculate correlation matrix
print(np.corrcoef(____))
# Calculate correlation matrix
#print(recent_grads_np[:,0])
print(np.corrcoef(recent_grads_np[:,0],recent_grads_np[:,1]))
trial errors
#print(recent_grads_np)
#print(np.corrcoef(low_wage_jobs,unemployment_rate))
#print(np.corrcoef(recent_grads_np))
answer
[[1. 0.95538815]
[0.95538815 1. ]]
<script.py> output:
[[1. 0.95538815]
[0.95538815 1. ]]
reference values
In [1]: # Calculate correlation matrix
print(recent_grads_np)
#print(np.corrcoef(low_wage_jobs,unemployment_rate))
[[ 37 193]
[ 85 50]
[ 16 0]
[ 40 0]
[ 1672 972]
[ 400 244]
[ 308 259]
[ 33 220]
[ 4650 3253]
[ 3895 3170]
[ 2275 980]
[ 794 372]
[ 1019 789]
[ 78 81]
[ 23 263]
[ 589 524]
[ 699 640]
[ 2859 3192]
[ 170 137]
[ 11 144]
[ 6884 5144]
[ 338 485]
[ 824 696]
[ 70 70]
[ 1015 708]
[ 3270 2899]
[ 1042 703]
[ 504 285]
[ 597 365]
[ 670 340]
[ 308 260]
[ 163 142]
[ 286 755]
[ 49 49]
[ 8497 6193]
[ 9413 9910]
[11452 10653]
[ 1165 1284]
[ 129 480]
[ 137 124]
[12411 10886]
[ 2884 4569]
[ 2934 1672]
[ 1282 1823]
[ 505 1002]
[ 639 608]
[ 401 343]
[ 385 357]
[ 107 93]
[ 99 186]
[ 74 245]
[ 407 1270]
[ 0 25]
[ 419 263]
[ 223 135]
[ 88 0]
[ 2271 2499]
[14946 27320]
[ 4366 4221]
[ 2092 3046]
[ 977 1121]
[ 1067 1168]
[ 1150 1758]
[ 649 1362]
[ 178 839]
[ 416 386]
[ 250 406]
[ 87 201]
[ 215 573]
[ 138 302]
[ 286 272]
[ 182 94]
[ 42 269]
[ 0 0]
[ 2769 4288]
[ 64 81]
[21502 32395]
[11663 27968]
[15022 19803]
[ 1799 1905]
[ 693 1246]
[ 721 308]
[ 2249 3012]
[ 0 56]
[ 1100 352]
[ 677 959]
[ 1315 1906]
[ 757 1336]
[ 893 1422]
[ 789 496]
[ 36 221]
[ 33 37]
[ 1779 3175]
[14602 27440]
[11268 18404]
[ 8947 14839]
[ 4535 8512]
[ 2727 5751]
[ 3305 7214]
[ 1668 3677]
[ 1067 1179]
[ 1088 2237]
[ 1743 1895]
[ 975 2449]
[ 1518 1391]
[ 2006 2495]
[ 962 557]
[ 842 1405]
[ 463 902]
[ 749 1061]
[ 78 237]
[ 322 327]
[ 0 0]
[ 7195 11443]
[11176 16839]
[ 3132 5267]
[ 1718 3168]
[ 1012 1806]
[ 1833 1854]
[ 216 786]
[ 0 111]
[ 529 1159]
[ 483 459]
[13874 28339]
[ 8608 13748]
[ 4410 6429]
[ 2409 4468]
[ 2393 9063]
[ 1379 2819]
[ 1302 2085]
[ 547 657]
[ 757 1470]
[ 437 976]
[ 833 1385]
[ 2183 3816]
[ 4267 8051]
[ 1206 2767]
[14345 26503]
[ 7297 11502]
[ 5593 16838]
[ 4657 9030]
[ 3718 5862]
[ 1108 1634]
[ 314 1231]
[ 266 591]
[28169 48207]
[ 3918 9286]
[ 1920 2042]
[ 1128 3426]
[ 5486 11880]
[ 3355 5248]
[ 3329 4344]
[ 917 2125]
[ 1465 2840]
[ 496 722]
[ 419 1650]
[ 326 724]
[ 372 1141]
[ 1617 3304]
[ 1368 3586]
[ 510 3163]
[ 82 31]
[ 3395 6866]
[ 1487 5125]
[ 1360 2868]
[ 846 1115]
[ 3040 11068]
[ 1340 3466]
[ 304 743]
[ 148 82]
[ 368 622]
[ 214 308]
[ 87 192]]