Plotting Scatterplots
Now that you've calculated the correlation coefficient between the low_wage_jobs and unemployment_rate columns, you want to create a visualization to effectively display this relationship. You'll use matplotlib to create a scatterplot of these two columns.
The DataFrame dept_stats is available in your workspace again, and the columns low_wage_jobs and unemployment_rate have been extracted into variables of the same name.
- Import matplotlib.pyplot with the alias plt.
- Create a scatter plot between unemployment_rate and low_wage_jobs per major category.
- Label the x axis with 'Unemployment rate'.
- Label the y axis with 'Low pay jobs'.
# Import matplotlib
import matplotlib.pyplot as plt
# Create scatter plot
plt.scatter(dept_stats['unemployment_rate'],dept_stats['low_wage_jobs'])
# Label x axis
plt.xlabel('Unemployment rate')
# Label y axis
plt.ylabel('Low pay jobs')
# Display the graph
plt.show()
[picture]
Modifying Plot Colors
The default settings for matplotlib may not be what you hope to present to others, so you decide to customize your plot for low wages versus unemployment rate.
Use the pandas DataFrame dept_stats again.
- Create the scatterplot visualization between the unemployment rate and number of low wage jobs per major category using the .scatter() .
plot() method. - Customize this scatterplot so that the points are red triangles by setting the color argument to "r" and the marker argument ^.
- Display the plot you've created!
# Plot the red and triangle shaped scatter plot
plt.scatter(dept_stats['unemployment_rate'],dept_stats['low_wage_jobs'], color="r", marker='^')
# Display the visualization
plt.show()
Plotting Histograms
Now that you've taken a look at that scatterplot, you want to go back to the sharewomen column that you were working with earlier. Specifically, you want to get an idea of how the values of sharewomen are distributed. This means you want to plot a histogram. For your convenience, the sharewomen column has been extracted from the recent_grads DataFrame into a variable called sharewomen.
- Use matplotlib to create a histogram of sharewomen.
- Show the plot you created.
# Plot a histogram of sharewomen
plt.hist(sharewomen)
# Show the plot
plt.show()
Plotting with pandas
In Python, there are several different ways to create visualizations. In fact, pandas has its own visualization capabilities, all of which are built on top of matplotlib! For example, you could have created the histogram from the previous exercise using recent_grads.sharewomen(kind="hist") instead of plt.hist(recent_grads.sharewomen).
Which approach you prefer comes down to personal preference - when working with DataFrames, it is advantageous to use pandas' plotting capabilities because the code tends to be less verbose.
Here, you will practice creating the plots from the previous exercises using pandas instead of matplotlib. All pandas plots are created using the .plot() method on a DataFrame. Inside .plot(), you can specify which plot you want to create using the kind parameter. For example, kind= 'hist' would create a histogram, kind='scatter' would create a scatter plot, and so on.
Use the .plot() method with kind='scatter' on the dept_stats DataFrame to create a scatter plot with 'unemployment_rate' on the x-axis and 'low_wage_jobs' on the y-axis.
# Import matplotlib and pandas
import matplotlib.pyplot as plt
import pandas as pd
# Create scatter plot
dept_stats.plot(kind='scatter', x='unemployment_rate', y='low_wage_jobs')
plt.show()
Now, create a histogram of the sharewomen column of the recent_grads DataFrame.
# Import matplotlib and pandas
import matplotlib.pyplot as plt
import pandas as pd
# Create histogram
recent_grads.sharewomen.plot(kind='hist')
plt.show()
Plotting one Bar Graphs
Next, you want to gauge how many students are graduating from each major category without a job that requires their degree, so you decide to create a bar chart between number of non college jobs and each major category.
- First, create a DataFrame to plot. Use recent_grads to make a DataFrame that reports each major category and the number of college graduates with a job that doesn't require a degree. Assign this to a variable named df.
- Plot this data as a bar chart using the .plot() method. Here, kind should be "bar".
- Display the plot you've created!
# DataFrame of non-college job sums
df = recent_grads.groupby(['major_category']).non_college_jobs.sum()
# Plot bar chart
df.plot(kind='bar')
# Show graph
plt.show()
Plotting Two Bar Graphs
The previous visualization gives you a good picture of how many students are working at jobs that don't require a college degree, but it doesn't give you a sense of how each category is doing relative to one another. So you decide to add the college_jobs column as an extra bar of information so that you can evaluate the difference between the two.
- Use pandas to create a DataFrame that reports the number of graduates working at jobs that do require college degrees ('college_jobs'), and do not require college degrees ('non_college_jobs'). Assign this to a variable named df1.
- Create a plot that reports this data with matplotlib.
- Display the plot you've created!
# DataFrame of college and non-college job sums
df1 = recent_grads.groupby(['major_category'])['college_jobs', 'non_college_jobs'].sum()
# Plot bar chart
df1.plot(kind='bar')
# Show graph
plt.show()
[picture]
'Python 응용 > DataScience교육_MS_DAT208x' 카테고리의 다른 글
[DAT208x] final lab 3-7 : Data Visualazation (Dropping Missing Values) (0) | 2021.10.23 |
---|---|
[DAT208x] final lab 2-1,2-2, 2-3, 2-4, 2-5 : Section 2: Manipulating Data (0) | 2021.10.23 |
[DAT208x] final lab 1-7, 1-8 (0) | 2021.10.23 |
[DAT208x] final lab 1-4, 1-5, 1-6 (0) | 2021.10.23 |
[DAT208x] final lab 1-3 (0) | 2021.10.23 |