Pandas DataFrame Examples with Plots¶

Pandas is a popular data manipulation library in Python

In [1]:

import pandas as pd

Creating and Exploring Pandas DataFrames¶

The primary data structure in Pandas is the DataFrame, which represents tabular data. You can create a DataFrame from various data sources, such as CSV files, Excel files, SQL databases, or by using Python lists or dictionaries.
Here's an example of creating a DataFrame from a Python dictionary:

In [2]:

data = {'Name': ['Alice', 'Bob', 'Charlie', 'Tom', 'Eve', 'Frank'],
        'Age': [25, 30, 35, 42, 51, 60],
        'City': ['New York', 'London', 'Paris', 'New York', 'London', 'Tokyo']}
df = pd.DataFrame(data)

Viewing the data¶

In [3]:

df.head()  # Returns the first few rows of the DataFrame.

Out[3]:

	Name	Age	City
0	Alice	25	New York
1	Bob	30	London
2	Charlie	35	Paris
3	Tom	42	New York
4	Eve	51	London

In [4]:

# Returns the last few rows of the DataFrame.
df.tail(3)

Out[4]:

	Name	Age	City
3	Tom	42	New York
4	Eve	51	London
5	Frank	60	Tokyo

In [5]:

# Returns the dimensions of the DataFrame (number of rows, number of columns).
df.shape

Out[5]:

(6, 3)

In [6]:

# Returns the column names of the DataFrame.
df.columns

Out[6]:

Index(['Name', 'Age', 'City'], dtype='object')

Accessing and selecting data¶

In [7]:

# Accesses a specific column by name.
# df['ColumnName'] 

df['Name']

Out[7]:

0      Alice
1        Bob
2    Charlie
3        Tom
4        Eve
5      Frank
Name: Name, dtype: object

In [8]:

# Accesses a specific element using row index and column name.
# df.loc[row_index, column_name] 

df.loc[0, 'Name']

Out[8]:

'Alice'

Data summary¶

In [9]:

df.describe()

Out[9]:

	Age
count	6.000000
mean	40.500000
std	13.217413
min	25.000000
25%	31.250000
50%	38.500000
75%	48.750000
max	60.000000

In [10]:

# Counts unique values in a column
# df['ColumnName'].value_counts()

df['Name'].value_counts()

Out[10]:

Alice      1
Bob        1
Charlie    1
Tom        1
Eve        1
Frank      1
Name: Name, dtype: int64

In [11]:

# Groups data by a column and calculates the mean of Age for each group
# df.groupby('ColumnName').mean()

df.groupby('City')['Age'].mean()

Out[11]:

City
London      40.5
New York    33.5
Paris       35.0
Tokyo       60.0
Name: Age, dtype: float64

Data manipulation¶

df['NewColumn'] = ... - Creates a new column based on existing data or calculations.
df.drop('ColumnName', axis=1) - Removes a column from the DataFrame.
df.sort_values('ColumnName') - Sorts the DataFrame by a column.
df.fillna(value) - Fills missing values with a specified value.

Reading and Writing Data¶

In [12]:

# write the DataFrame to a csv file
df.to_csv("myfile.csv")

In [13]:

# Read a csv file into a Pandas DataFrame
df2 = pd.read_csv("myfile.csv")
df2.head()

Out[13]:

	Unnamed: 0	Name	Age	City
0	0	Alice	25	New York
1	1	Bob	30	London
2	2	Charlie	35	Paris
3	3	Tom	42	New York
4	4	Eve	51	London

Plotting examples¶

Pandas provides convenient integration with Matplotlib, a popular plotting library in Python. Here are a few examples of data plotting using Pandas

Line Plot¶

You can create a line plot using the plot method of a DataFrame. Here's an example of plotting a line graph to visualize the trend of a numerical column over time:

In [17]:

import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams.update({'font.size': 20, 'figure.figsize': [14,10]})

# Create a DataFrame with time-series data
data = {'Date': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01'],
        'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)

# Convert the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])

# Plot the line graph
plt.plot(df['Date'], df['Value'])
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Line Plot')
plt.show()

No description has been provided for this image

Bar Plot¶

Pandas makes it easy to create bar plots using the plot method. Here's an example of plotting a bar chart to compare values in different categories:

In [18]:

import pandas as pd
import matplotlib.pyplot as plt

# Create a DataFrame with categorical data
data = {'Category': ['A', 'B', 'C', 'D'],
        'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)

# Plot the bar chart
plt.bar(df['Category'], df['Value'])
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Bar Plot')
plt.show()

Histogram¶

Pandas provides a simple way to create histograms using the plot method. Here's an example of plotting a histogram to visualize the distribution of a numerical column:

In [19]:

import pandas as pd
import matplotlib.pyplot as plt

# Create a DataFrame with numerical data
data = {'Value': [10, 20, 15, 25, 30, 40, 35, 50]}
df = pd.DataFrame(data)

# Plot the histogram
df['Value'].plot.hist(bins=10)  # 'bins' specifies the number of bins or intervals
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()

In [ ]: