본문 바로가기

Python 응용/DataScience교육_MS_DAT208x

[DAT208x] final lab 1-4, 1-5, 1-6

Select a Column

Python's pandas module allows you to select a specific column from a DataFrame, which is especially useful for when you only need to manipulate one piece of data. In this exercise, you'll select the sharewomen column, which shows the percentage of women for a given department.

The DataFrame recent_grads is still in your workspace.

Instructions

 

Select the sharewomen column from recent_grads and assign this to a variable named sw_col.

Output the first 5 rows of sw_col

 

# Select sharewomen column
sw_col = recent_grads['sharewomen']

# Output first five rows
print(sw_col.head(5))

 

0    0.120564
1    0.101852
2    0.153037
3    0.107313
4    0.341631
Name: sharewomen, dtype: float64

<script.py> output:
    0    0.120564
    1    0.101852
    2    0.153037
    3    0.107313
    4    0.341631
    Name: sharewomen, dtype: float64

 

Column Maximum Value

Now that you've selected the sharewomen column, you'll use numpy to output its maximum value.

The variable sw_col you created in the last exercise is still available in your workspace.

  •  
  • Import numpy as np.
  • Using a numpy built-in function, find the maximum value of the sharewomen column and assign this value to the variable max_sw.
  • Print the value of max_sw

 

# Import numpy
import numpy as np

# Use max to output maximum values
max_sw = np.max(sw_col)

# Print column max
print(max_sw)

 

0.968953683

<script.py> output:
    0.968953683

 

Selecting a Row

While you know what the maximum percentage of women in a department is, which department is this? You'll output this information by filtering the dataset with pandas.

The variables sw_col and max_sw are still in your workspace.

 

Output the row of data for the department that has the largest percentage of women.

 

# Output the row containing the maximum percentage of women
print(____)

# Output the row containing the maximum percentage of women
print(recent_grads.loc[sw_col == max_sw])

 

In [2]: # Output the row containing the maximum percentage of women
     print(recent_grads.loc[sw_col == max_sw])
     rank  major_code                        major             major_category  \
     162   163        5502  ANTHROPOLOGY AND ARCHEOLOGY  Humanities & Liberal Arts  

     total  sample_size   men  women  sharewomen  employed      ...        \
     162  38844          247  1167  36422    0.968954     29633      ...        

     part_time  full_time_year_round  unemployed  unemployment_rate  median  \
     162      14515                 13232        3395           0.102792   28000  

     p25th  p75th college_jobs  non_college_jobs  low_wage_jobs 
     162  20000  38000         9805             16693           6866 

[1 rows x 21 columns]

<script.py> output:
    rank  major_code                        major             major_category  \
    162   163        5502  ANTHROPOLOGY AND ARCHEOLOGY  Humanities & Liberal Arts  
    total  sample_size   men  women  sharewomen  employed      ...        \
    162  38844          247  1167  36422    0.968954     29633      ...           
    part_time  full_time_year_round  unemployed  unemployment_rate  median  \
    162      14515                 13232        3395           0.102792   28000  
    p25th  p75th college_jobs  non_college_jobs  low_wage_jobs 
    162  20000  38000         9805             16693           6866 
   
    [1 rows x 21 columns]

In [3]: