Select a Column
Python's pandas module allows you to select a specific column from a DataFrame, which is especially useful for when you only need to manipulate one piece of data. In this exercise, you'll select the sharewomen column, which shows the percentage of women for a given department.
The DataFrame recent_grads is still in your workspace.
Instructions
Select the sharewomen column from recent_grads and assign this to a variable named sw_col.
Output the first 5 rows of sw_col.
# Select sharewomen column
sw_col = recent_grads['sharewomen']
# Output first five rows
print(sw_col.head(5))
0 0.120564
1 0.101852
2 0.153037
3 0.107313
4 0.341631
Name: sharewomen, dtype: float64
<script.py> output:
0 0.120564
1 0.101852
2 0.153037
3 0.107313
4 0.341631
Name: sharewomen, dtype: float64
Column Maximum Value
Now that you've selected the sharewomen column, you'll use numpy to output its maximum value.
The variable sw_col you created in the last exercise is still available in your workspace.
- Import numpy as np.
- Using a numpy built-in function, find the maximum value of the sharewomen column and assign this value to the variable max_sw.
- Print the value of max_sw
# Import numpy
import numpy as np
# Use max to output maximum values
max_sw = np.max(sw_col)
# Print column max
print(max_sw)
0.968953683
<script.py> output:
0.968953683
Selecting a Row
While you know what the maximum percentage of women in a department is, which department is this? You'll output this information by filtering the dataset with pandas.
The variables sw_col and max_sw are still in your workspace.
Output the row of data for the department that has the largest percentage of women.
# Output the row containing the maximum percentage of women
print(____)
# Output the row containing the maximum percentage of women
print(recent_grads.loc[sw_col == max_sw])
In [2]: # Output the row containing the maximum percentage of women
print(recent_grads.loc[sw_col == max_sw])
rank major_code major major_category \
162 163 5502 ANTHROPOLOGY AND ARCHEOLOGY Humanities & Liberal Arts
total sample_size men women sharewomen employed ... \
162 38844 247 1167 36422 0.968954 29633 ...
part_time full_time_year_round unemployed unemployment_rate median \
162 14515 13232 3395 0.102792 28000
p25th p75th college_jobs non_college_jobs low_wage_jobs
162 20000 38000 9805 16693 6866
[1 rows x 21 columns]
<script.py> output:
rank major_code major major_category \
162 163 5502 ANTHROPOLOGY AND ARCHEOLOGY Humanities & Liberal Arts
total sample_size men women sharewomen employed ... \
162 38844 247 1167 36422 0.968954 29633 ...
part_time full_time_year_round unemployed unemployment_rate median \
162 14515 13232 3395 0.102792 28000
p25th p75th college_jobs non_college_jobs low_wage_jobs
162 20000 38000 9805 16693 6866
[1 rows x 21 columns]
In [3]:
'Python 응용 > DataScience교육_MS_DAT208x' 카테고리의 다른 글
[DAT208x] final lab 2-1,2-2, 2-3, 2-4, 2-5 : Section 2: Manipulating Data (0) | 2021.10.23 |
---|---|
[DAT208x] final lab 1-7, 1-8 (0) | 2021.10.23 |
[DAT208x] final lab 1-3 (0) | 2021.10.23 |
[DAT208x] final lab 1-1 : Section 1: Importing and Summarizing Data (0) | 2021.10.23 |
[DAT208x] final lab 1-2 (0) | 2021.10.23 |