Replacing Missing Values
There are some missing values in the dataset that are coded as a string. You'll update these to a value that Python understands as "missing."
The list columns contains the names of the columns you'll be working with in this exercise.
Instructions
- Look at the dtypes of the columns in columns to make sure that the data is numeric.
- It looks like a string is being used to encode missing values. Use the .unique() method to figure out what the string is.
- Search for missing values in the median, p25th, and p75th columns.
- Replace the found missing values with a NaN value, using numpy's np.nan.
# Names of the columns we're searching for missing values
columns = ['median', 'p25th', 'p75th']
# Take a look at the dtypesprint(recent_grads.dtypes)
print(recent_grads[columns].dtypes)
# Find how missing values are represented
print(recent_grads["median"].unique())
# Replace missing values with NaN
for column in columns:
recent_grads.loc[recent_grads[column] == 'UN', column] = np.nan
print(recent_grads[columns].dtypes)
median object
p25th object
p75th object
dtype: object
result of print(recent_grads.dtypes) rank int64
major_code int64
major object
major_category object
total int64
sample_size int64
men int64
women int64
sharewomen float64
employed int64
full_time int64
part_time int64
full_time_year_round int64
unemployed int64
unemployment_rate float64
median object
p25th object
p75th object
college_jobs int64
non_college_jobs int64
low_wage_jobs int64
dtype: object
['110000' '75000' '73000' '70000' '65000' 'UN' '62000' '60000' '58000'
'57100' '57000' '56000' '54000' '53000' '52000' '51000' '50000' '48000'
'47000' '46000' '45000' '44700' '44000' '42000' '41300' '41000' '40100'
'40000' '39000' '38400' '38000' '37500' '37400' '37000' '36400' '36200'
'36000' '35600' '35000' '34000' '33500' '33400' '33000' '32500' '32400'
'32200' '32100' '32000' '31500' '31000' '30500' '30000' '29000' '28000'
'27500' '27000' '26000' '25000' '23400' '22000']
In [2]:
hint
https://stackoverflow.com/questions/50876108/names-of-the-columns-were-searching-for-missing-values
'Python 응용 > DataScience교육_MS_DAT208x' 카테고리의 다른 글
[DAT208x] final lab 1-7, 1-8 (0) | 2021.10.23 |
---|---|
[DAT208x] final lab 1-4, 1-5, 1-6 (0) | 2021.10.23 |
[DAT208x] final lab 1-1 : Section 1: Importing and Summarizing Data (0) | 2021.10.23 |
[DAT208x] final lab 1-2 (0) | 2021.10.23 |
python basic : Introduction to Python for Data Science ( Microsoft: DAT208x ) (0) | 2021.10.23 |