Pandas DataFrame Examples with Plots¶
Pandas is a popular data manipulation library in Python
import pandas as pd
Creating and Exploring Pandas DataFrames¶
The primary data structure in Pandas is the DataFrame, which represents tabular data. You can create a DataFrame from various data sources, such as CSV files, Excel files, SQL databases, or by using Python lists or dictionaries.
Here's an example of creating a DataFrame from a Python dictionary:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Tom', 'Eve', 'Frank'],
'Age': [25, 30, 35, 42, 51, 60],
'City': ['New York', 'London', 'Paris', 'New York', 'London', 'Tokyo']}
df = pd.DataFrame(data)
Viewing the data¶
df.head() # Returns the first few rows of the DataFrame.
# Returns the last few rows of the DataFrame.
df.tail(3)
# Returns the dimensions of the DataFrame (number of rows, number of columns).
df.shape
# Returns the column names of the DataFrame.
df.columns
Accessing and selecting data¶
# Accesses a specific column by name.
# df['ColumnName']
df['Name']
# Accesses a specific element using row index and column name.
# df.loc[row_index, column_name]
df.loc[0, 'Name']
Data summary¶
df.describe()
# Counts unique values in a column
# df['ColumnName'].value_counts()
df['Name'].value_counts()
# Groups data by a column and calculates the mean of Age for each group
# df.groupby('ColumnName').mean()
df.groupby('City')['Age'].mean()
Data manipulation¶
df['NewColumn'] = ... - Creates a new column based on existing data or calculations.
df.drop('ColumnName', axis=1) - Removes a column from the DataFrame.
df.sort_values('ColumnName') - Sorts the DataFrame by a column.
df.fillna(value) - Fills missing values with a specified value.
Reading and Writing Data¶
# write the DataFrame to a csv file
df.to_csv("myfile.csv")
# Read a csv file into a Pandas DataFrame
df2 = pd.read_csv("myfile.csv")
df2.head()
Plotting examples¶
Pandas provides convenient integration with Matplotlib, a popular plotting library in Python. Here are a few examples of data plotting using Pandas
Line Plot¶
You can create a line plot using the plot method of a DataFrame. Here's an example of plotting a line graph to visualize the trend of a numerical column over time:
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 20, 'figure.figsize': [14,10]})
# Create a DataFrame with time-series data
data = {'Date': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01'],
'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)
# Convert the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])
# Plot the line graph
plt.plot(df['Date'], df['Value'])
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Line Plot')
plt.show()
Bar Plot¶
Pandas makes it easy to create bar plots using the plot method. Here's an example of plotting a bar chart to compare values in different categories:
import pandas as pd
import matplotlib.pyplot as plt
# Create a DataFrame with categorical data
data = {'Category': ['A', 'B', 'C', 'D'],
'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)
# Plot the bar chart
plt.bar(df['Category'], df['Value'])
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Bar Plot')
plt.show()
Histogram¶
Pandas provides a simple way to create histograms using the plot method. Here's an example of plotting a histogram to visualize the distribution of a numerical column:
import pandas as pd
import matplotlib.pyplot as plt
# Create a DataFrame with numerical data
data = {'Value': [10, 20, 15, 25, 30, 40, 35, 50]}
df = pd.DataFrame(data)
# Plot the histogram
df['Value'].plot.hist(bins=10) # 'bins' specifies the number of bins or intervals
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()