Exploring Your Data
Now you'll perform some data exploration using the Python pandas module. To get a sense of the data, you'll output statistics such as mean, median, count, and percentiles.
The DataFrame recent_grads is still in your workspace.
Instructions
- Print the .dtypes of your data so that you know what each column contains.
- Output basic summary statistics using a single pandas function.
- With the same function from before, summary statistics for all columns that aren't of type object.
script.py
In [3]: # Print .dtypes
print(recent_grads.dtypes)
# Output summary statistics
print(recent_grads.describe())
# Exclude data of type object
print(recent_grads.describe(exclude=['object']))
rank int64
major_code int64
major object
major_category object
total int64
sample_size int64
men int64
women int64
sharewomen float64
employed int64
full_time int64
part_time int64
full_time_year_round int64
unemployed int64
unemployment_rate float64
median object
p25th object
p75th object
college_jobs int64
non_college_jobs int64
low_wage_jobs int64
dtype: object
더보기
rank major_code total sample_size men \
count 173.000000 173.000000 173.000000 173.000000 173.000000
mean 87.000000 3879.815029 39167.716763 356.080925 16637.358382
std 50.084928 1687.753140 63354.613919 618.361022 28063.394844
min 1.000000 1100.000000 124.000000 2.000000 119.000000
25% 44.000000 2403.000000 4361.000000 39.000000 2110.000000
50% 87.000000 3608.000000 15058.000000 130.000000 5347.000000
75% 130.000000 5503.000000 38844.000000 338.000000 14440.000000
max 173.000000 6403.000000 393735.000000 4212.000000 173809.000000
women sharewomen employed full_time part_time \
count 173.000000 173.000000 173.000000 173.000000 173.000000
mean 22530.358382 0.522550 31192.763006 26029.306358 8832.398844
std 40966.381219 0.230572 50675.002241 42869.655092 14648.179473
min 0.000000 0.000000 0.000000 111.000000 0.000000
25% 1784.000000 0.339671 3608.000000 3154.000000 1030.000000
50% 8284.000000 0.535714 11797.000000 10048.000000 3299.000000
75% 22456.000000 0.702020 31433.000000 25147.000000 9948.000000
max 307087.000000 0.968954 307933.000000 251540.000000 115172.000000
full_time_year_round unemployed unemployment_rate college_jobs \
count 173.000000 173.000000 172.000000 173.000000
mean 19694.427746 2416.329480 0.068587 12322.635838
std 33160.941514 4112.803148 0.029967 21299.868863
min 111.000000 0.000000 0.000000 0.000000
25% 2453.000000 304.000000 0.050723 1675.000000
50% 7413.000000 893.000000 0.068272 4390.000000
75% 16891.000000 2393.000000 0.087599 14444.000000
max 199897.000000 28169.000000 0.177226 151643.000000
non_college_jobs low_wage_jobs
count 173.000000 173.000000
mean 13284.497110 3859.017341
std 23789.655363 6944.998579
min 0.000000 0.000000
25% 1591.000000 340.000000
50% 4595.000000 1231.000000
75% 11783.000000 3466.000000
max 148395.000000 48207.000000
rank major_code total sample_size men \
count 173.000000 173.000000 173.000000 173.000000 173.000000
mean 87.000000 3879.815029 39167.716763 356.080925 16637.358382
std 50.084928 1687.753140 63354.613919 618.361022 28063.394844
min 1.000000 1100.000000 124.000000 2.000000 119.000000
25% 44.000000 2403.000000 4361.000000 39.000000 2110.000000
50% 87.000000 3608.000000 15058.000000 130.000000 5347.000000
75% 130.000000 5503.000000 38844.000000 338.000000 14440.000000
max 173.000000 6403.000000 393735.000000 4212.000000 173809.000000
women sharewomen employed full_time part_time \
count 173.000000 173.000000 173.000000 173.000000 173.000000
mean 22530.358382 0.522550 31192.763006 26029.306358 8832.398844
std 40966.381219 0.230572 50675.002241 42869.655092 14648.179473
min 0.000000 0.000000 0.000000 111.000000 0.000000
25% 1784.000000 0.339671 3608.000000 3154.000000 1030.000000
50% 8284.000000 0.535714 11797.000000 10048.000000 3299.000000
75% 22456.000000 0.702020 31433.000000 25147.000000 9948.000000
max 307087.000000 0.968954 307933.000000 251540.000000 115172.000000
full_time_year_round unemployed unemployment_rate college_jobs \
count 173.000000 173.000000 172.000000 173.000000
mean 19694.427746 2416.329480 0.068587 12322.635838
std 33160.941514 4112.803148 0.029967 21299.868863
min 111.000000 0.000000 0.000000 0.000000
25% 2453.000000 304.000000 0.050723 1675.000000
50% 7413.000000 893.000000 0.068272 4390.000000
75% 16891.000000 2393.000000 0.087599 14444.000000
max 199897.000000 28169.000000 0.177226 151643.000000
non_college_jobs low_wage_jobs
count 173.000000 173.000000
mean 13284.497110 3859.017341
std 23789.655363 6944.998579
min 0.000000 0.000000
25% 1591.000000 340.000000
50% 4595.000000 1231.000000
75% 11783.000000 3466.000000
max 148395.000000 48207.000000
In [4]:
https://campus.datacamp.com/courses/introduction-to-python-for-data-science-final-lab/19346?ex=2
'Python 응용 > DataScience교육_MS_DAT208x' 카테고리의 다른 글
[DAT208x] final lab 1-7, 1-8 (0) | 2021.10.23 |
---|---|
[DAT208x] final lab 1-4, 1-5, 1-6 (0) | 2021.10.23 |
[DAT208x] final lab 1-3 (0) | 2021.10.23 |
[DAT208x] final lab 1-1 : Section 1: Importing and Summarizing Data (0) | 2021.10.23 |
python basic : Introduction to Python for Data Science ( Microsoft: DAT208x ) (0) | 2021.10.23 |