50. What is Pandas ml?
Ans: pandas_ml is a package which integrates pandas, scikit-learn, xgboost into one package for easy handling of data and creation of machine learning models
Installation
$ pip install pandas_ml
Example
>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets
# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>
# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
.target 0 1 2 3 4 5 6 7 8 ... 54 55 56 57 58 59 60 61 62 63
0 0 0 0 1 1 1 1 0 0 0 ... 0 0 0 0 1 1 1 0 0 0
1 1 0 0 0 1 1 1 0 0 0 ... 0 0 0 0 0 1 1 1 0 0
2 2 0 0 0 1 1 1 0 0 0 ... 1 0 0 0 0 1 1 1 1 0
3 3 0 0 1 1 1 1 0 0 0 ... 1 0 0 0 1 1 1 1 0 0
4 4 0 0 0 1 1 0 0 0 0 ... 0 0 0 0 0 1 1 1 0 0
[5 rows x 65 columns]
# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()
# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()
# fit to training data
>>> train_df.fit(estimator)
# predict test data
>>> test_df.predict(estimator)
0 4
1 2
2 7
...
448 5
449 8
Length: 450, dtype: int64
# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted 0 1 2 3 4 5 6 7 8 9
Target
0 52 0 0 0 0 0 0 0 0 0
1 0 37 1 0 0 1 0 0 3 3
2 0 2 48 1 0 0 0 1 1 0
3 1 1 0 44 0 1 0 0 3 1
4 1 0 0 0 43 0 1 0 0 0
5 0 1 0 0 0 39 0 0 0 0
6 0 1 0 0 1 0 35 0 0 0
7 0 0 0 0 2 0 0 42 1 0
8 0 2 1 0 1 0 0 0 33 1
9 0 2 1 2 0 0 0 0 1 38
51. What is Pandas Charm?
Ans: pandas-charm is a small Python package for getting character matrices (alignments) into and out of pandas. Use this library to make pandas interoperable with BioPython and DendroPy.
Convert between the following objects:
- BioPython Multiple Seq Alignment <-> pandas DataFrame
- DendroPy Character Matrix <-> pandas DataFrame
- “Sequence dictionary” <-> pandas DataFrame
The code has been tested with Python 2.7, 3.5 and 3.6.
Installation :
$ pip install pandas-charm
You may consider installing pandas-charm and its required Python packages within a virtual environment in order to avoid cluttering your system’s Python path. See for example the environment management system conda or the package virtualenv.
Running the tests
Testing is carried out with pytest:
$ pytest -v test_pandascharm.py
Test coverage can be calculated with Coverage.py using the following commands:
$ coverage run -m pytest
$ coverage report -m pandascharm.py
The code follow style conventions in PEP8, which can be checked with pycodestyle:
$ pycodestyle pandascharm.py test_pandascharm.py setup.py
Usage
The following examples show how to use pandas-charm. The examples are written with Python 3 code, but pandas-charm should work also with Python 2.7+. You need to install BioPython and/or DendroPy manually before you start:
$ pip install biopython
$ pip install dendropy
DendroPy CharacterMatrix to pandas DataFrame
>>> import pandas as pd
>>> import pandascharm as pc
>>> import dendropy
>>> dna_string = '3 5\nt1 TCCAA\nt2 TGCAA\nt3 TG-AA\n'
>>> print(dna_string)
3 5
t1 TCCAA
t2 TGCAA
t3 TG-AA
>>> matrix = dendropy.DnaCharacterMatrix.get(
... data=dna_string, schema='phylip')
>>> df = pc.from_charmatrix(matrix)
>>> df
t1 t2 t3
0 T T T
1 C G G
2 C C -
3 A A A
4 A A A
By default, characters are stored as rows and sequences as columns in the DataFrame. If you want rows to hold sequences, just transpose the matrix in pandas:
>>> df.transpose()
0 1 2 3 4
t1 T C C A A
t2 T G C A A
t3 T G - A A
52. How will you add a scalar column with same value for all rows to a pandas DataFrame?
Ans:
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Dataframe.add()
method is used for addition of dataframe and other, element-wise (binary operator add). Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs.
Syntax: DataFrame.add(other, axis=’columns’, level=None, fill_value=None)
Parameters:
other :Series, DataFrame, or constant
axis :{0, 1, ‘index’, ‘columns’} For Series input, axis to match Series index on
fill_value : [None or float value, default None] Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing.
level : [int or name] Broadcast across a level, matching Index values on the passed MultiIndex level
Returns: result DataFrame
import
pandas as pd
import
numpy as np
np.random.seed(
25
)
df
=
pd.DataFrame(np.random.rand(
10
,
3
), columns
=
[
'A'
,
'B'
,
'C'
])
df
Output :
53. How can we select a column in pandas DataFrame?
Ans:
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Let’s discuss all different ways of selecting multiple columns in a pandas DataFrame.
Method #1: Basic Method
Given a dictionary which contains Employee entity as keys and list of those entity as values.
import
pandas as pd
data
=
{
'Name'
:[
'Jai'
,
'Princi'
,
'Gaurav'
,
'Anuj'
],
'Age'
:[
27
,
24
,
22
,
32
],
'Address'
:[
'Delhi'
,
'Kanpur'
,
'Allahabad'
,
'Kannauj'
],
'Qualification'
:[
'Msc'
,
'MA'
,
'MCA'
,
'Phd'
]}
df
=
pd.DataFrame(data)
df[[
'Name'
,
'Qualification'
]]
Output:
Select Second to fourth column.
import
pandas as pd
data
=
{
'Name'
:[
'Jai'
,
'Princi'
,
'Gaurav'
,
'Anuj'
],
'Age'
:[
27
,
24
,
22
,
32
],
'Address'
:[
'Delhi'
,
'Kanpur'
,
'Allahabad'
,
'Kannauj'
],
'Qualification'
:[
'Msc'
,
'MA'
,
'MCA'
,
'Phd'
]}
df
=
pd.DataFrame(data)
df[df.columns[
1
:
4
]]
Output :
54. How can we retrieve a row in pandas DataFrame ?
Ans: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[]
method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame.
Syntax: pandas.DataFrame.loc[ ]
Parameters:
Index label: String or list of string of index label of rows
Return type: Data frame or Series depending on parameters
Example #1 : Extracting single Row
In this example, Name column is made as the index column and then two single rows are extracted one by one in the form of series using index label of rows.
import
pandas as pd
data
=
pd.read_csv(
"nba.csv"
, index_col
=
"Name"
)
first
=
data.loc[
"Avery Bradley"
]
second
=
data.loc[
"R.J. Hunter"
]
print
(first,
"\n\n\n"
, second)
Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.
Example #2: Multiple parameters
In this example, Name column is made as the index column and then two single rows are extracted at the same time by passing a list as parameter.
import
pandas as pd
data
=
pd.read_csv(
"nba.csv"
, index_col
=
"Name"
)
rows
=
data.loc[[
"Avery Bradley"
,
"R.J. Hunter"
]]
print
(
type
(rows))
rows
Output:
As shown in the output image, this time the data type of returned value is a data frame. Both of the rows were extracted and displayed like a new data frame.
55. How will you convert a DataFrame to an array in pandas?
Ans:
For performing some high-level mathematical functions, we can convert Pandas DataFrame to numpy arrays. It uses the DataFrame.to_numpy() function.
The DataFrame.to_numpy() function is applied on the DataFrame that returns the numpy ndarray.
Syntax:
DataFrame.to_numpy(dtype=None, copy=False)
Parameters
- dtype: It is an optional parameter that pass the dtype to numpy.asarray().
- copy: It returns the boolean value that has the default value False.
It ensures that the returned value is not a view on another array.
Returns
It returns the numpy.ndarray as an output.
Example1:
- import pandas as pd
- pd.DataFrame({“P”: [2, 3], “Q”: [4, 5]}).to_numpy()
- info = pd.DataFrame({“P”: [2, 3], “Q”: [4.0, 5.8]})
- info.to_numpy()
- info[‘R’] = pd.date_range(‘2000’, periods=2)
- info.to_numpy()
Output :
array([[2, 4.0, Timestamp('2000-01-01 00:00:00')],
[3, 5.8, Timestamp('2000-01-02 00:00:00')]], dtype=object)
Example 2:
- import pandas as pd
- #initializing the dataframe
- info = pd.DataFrame([[17, 62, 35],[25, 36, 54],[42, 20, 15],[48, 62, 76]],
- columns=[‘x’, ‘y’, ‘z’])
- print(‘DataFrame\n———-\n’, info)
- #convert the dataframe to a numpy array
- arr = info.to_numpy()
- print(‘\nNumpy Array\n———-\n’, arr)
Output:
DataFrame
----------
x y z
0 17 62 35
1 25 36 54
2 42 20 15
3 48 62 76
Numpy Array
----------
[[17 62 35]
[25 36 54]
[42 20 15]
[48 62 76]]
56. How can you check if a DataFrame is empty in pandas?
Ans : Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas.
Pandas DataFrame.empty
attribute checks if the dataframe is empty or not. It return True
if the dataframe is empty else it return False
.
Syntax: DataFrame.empty
Parameter : None
Returns : bool
Example #1: Use DataFrame.empty
attribute to check if the given dataframe is empty or not
import
pandas as pd
df
=
pd.DataFrame({
'Weight'
:[
45
,
88
,
56
,
15
,
71
],
'Name'
:[
'Sam'
,
'Andrea'
,
'Alex'
,
'Robin'
,
'Kia'
],
'Age'
:[
14
,
25
,
55
,
8
,
21
]})
index_
=
[
'Row_1'
,
'Row_2'
,
'Row_3'
,
'Row_4'
,
'Row_5'
]
df.index
=
index_
print
(df)
Output :
Now we will use DataFrame.empty
attribute to check if the given dataframe is empty or not.
result
=
df.empty
print
(result)
Output :
As we can see in the output, the DataFrame.empty
attribute has returned False
indicating that the given dataframe is not empty.
Example #2: Use DataFrame.empty
attribute to check if the given dataframe is empty or not.
result
=
df.empty
print
(result)
Output :
As we can see in the output, the
DataFrame.empty
attribute has returned
True
indicating that the given dataframe is empty.
57. How can you get the sum of values of a column in pandas DataFrame?
Ans: Pandas dataframe.sum()
function return the sum of the values for the requested axis. If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in each column. It also provides support to skip the missing values in the dataframe while calculating the sum in the dataframe.
Syntax: DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters :
axis : {index (0), columns (1)}
skipna : Exclude NA/null values when computing the result.
level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_count : The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
Returns : sum : Series or DataFrame (if level specified)
Example #1: Use sum()
function to find the sum of all the values over the index axis.
import
pandas as pd
df
=
pd.read_csv(
"nba.csv"
)
df
Now find the sum of all values along the index axis. We are going to skip the NaN
values in the calculation of the sum.
df.
sum
(axis
=
0
, skipna
=
True
)
Output:
58. How will you get the average of values of a column in pandas DataFrame?
Ans:
Pandas dataframe.mean()
function return the mean of the values for the requested axis. If the method is applied on a pandas series object, then the method returns a scalar value which is the mean value of all the observations in the dataframe. If the method is applied on a pandas dataframe object, then the method returns a pandas series object which contains the mean of the values over the specified axis.
Syntax: DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Parameters :
axis : {index (0), columns (1)}
skipna : Exclude NA/null values when computing the result
level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
Returns : mean : Series or DataFrame (if level specified)
Example : Use mean()
function to find the mean of all the observations over the index axis.
import
pandas as pd
df
=
pd.DataFrame({
"A"
:[
12
,
4
,
5
,
44
,
1
],
"B"
:[
5
,
2
,
54
,
3
,
2
],
"C"
:[
20
,
16
,
7
,
3
,
8
],
"D"
:[
14
,
3
,
17
,
2
,
6
]})
df
Let’s use the dataframe.mean()
function to find the mean over the index axis.
df.mean(axis
=
0
)
Output:
59. How will you apply a function to every data element in a DataFrame?
Ans:
One can use apply()
function in order to apply function to every row in given dataframe. Let’s see the ways we can do this task.
Example
import
pandas as pd
def
add(a, b, c):
return
a
+
b
+
c
def
main():
data
=
{
'A'
:[
1
,
2
,
3
],
'B'
:[
4
,
5
,
6
],
'C'
:[
7
,
8
,
9
] }
df
=
pd.DataFrame(data)
print
(
"Original DataFrame:\n"
, df)
df[
'add'
]
=
df.
apply
(
lambda
row : add(row[
'A'
],
row[
'B'
], row[
'C'
]), axis
=
1
)
print
(
'\nAfter Applying Function: '
)
print
(df)
if
__name__
=
=
'__main__'
:
main()
Output:
60. How will you get the top 2 rows from a DataFrame in pandas?
# Select the first 2 rows of the Dataframe
dfObj1 = empDfObj.head(2)
print(“First 2 rows of the Dataframe : “)
print(dfObj1)
Output:
First 2 rows of the Dataframe :
Name Age City Experience
a jack 34 Sydney 5
b Riti 31 Delhi 7
61. List major features of the Python pandas?
Some of the major features of Python Pandas are,
- Fast and efficient in handling the data with its DataFrame object.
- It provides tools for loading data into in-memory data objects from various file formats.
- It has high-performance in merging and joining data.
- It has Time Series functionality.
- It provides functions for Data set merging and joining.
- It has functionalities for label-based slicing, fancy indexing, and subsetting of large data sets.
- It provides functionalities for reshaping and pivoting of data sets.
62. Enlist different types of Data Structures available in Pandas?
Ans: Different types of data structures available in Pandas are,
DataFrame – It is a tabular data structure which comprises of rows and columns. Here, data and size are mutable.
Panel – It is a three-dimensional data structure to store the data heterogeneously.
63. How To Write a Pandas DataFrame to a File
When you have done your data munging and manipulation with Pandas, you might want to export the DataFrame to another format. This section will cover two ways of outputting your DataFrame: to a CSV or to an Excel file.
Outputting a DataFrame to CSV
To output a Pandas DataFrame as a CSV file, you can use to_csv().
Writing a DataFrame to Excel
Very similar to what you did to output your DataFrame to CSV, you can use to_excel() to write your table to Excel.
64. When, Why And How You Should Reshape Your Pandas DataFrame
Ans: Reshaping your DataFrame is basically transforming it so that the resulting structure makes it more suitable for your data analysis.
In other words, reshaping is not so much concerned with formatting the values that are contained within the DataFrame, but more about transforming the shape of it.
This answers the when and why. Now onto the how of reshaping your DataFrame.
There are three ways of reshaping that frequently raise questions with users: pivoting, stacking and unstacking and melting.
Keep on reading to find out more!
Remember that if you want to see code examples and want to practice your DataFrame skills in our interactive DataCamp environment, go here.
Pivoting Your DataFrame
You can use the pivot() function to create a new derived table out of your original one. When you use the function, you can pass three arguments:
- Values: this argument allows you to specify which values of your original DataFrame you want to see in your pivot table.
- Columns: whatever you pass to this argument will become a column in your resulting table.
- Index: whatever you pass to this argument will become an index in your resulting table.
When you don’t specifically fill in what values you expect to be present in your resulting table, you will pivot by multiple columns. Note that your data can not have rows with duplicate values for the columns that you specify. If this is not the case, you will get an error message. If you can’t ensure the uniqueness of your data, you will want to use the pivot_table method instead .
Using stack() and unstack() to Reshape Your Pandas DataFrame
You have already seen an example of stacking in the answer to question 5!
Good news, you already know why you would use this and what you need to do to do it.
To repeat, when you stack a DataFrame, you make it taller. You move the innermost column index to become the innermost row index. You return a DataFrame with an index with a new inner-most level of row labels.
Go back to the full walk-through of the answer to question 5 “Splitting Text Into Multiple Columns” if you’re unsure of the workings of `stack().
The inverse of stacking is called unstacking. Much like stack(), you use unstack() to move the innermost row index to become the innermost column index.
Reshaping Your DataFrame With Melt()
Melting is considered to be very useful for when you have a data that has one or more columns that are identifier variables, while all other columns are considered measured variables.
These measured variables are all “unpivoted” to the row axis. That is, while the measured variables that were spread out over the width of the DataFrame, the melt will make sure that they will be placed in the height of it. Or, yet in other words, your DataFrame will now become longer instead of wider.
As a result, you just have two non-identifier columns, namely, ‘variable’ and ‘value’.
65. Does Pandas Recognize Dates When Importing Data?
Ans:
Pandas can recognize it, but you need to help it a tiny bit: add the argument parse_dates when you’reading in data from, let’s say, a comma-separated value (CSV) file.
There are, however, always weird date-time formats.
(Honestly, who has never had this?)
In such cases, you can construct your own parser to deal with this. You could, for example, make a lambda function that takes your DateTime and controls it with a format string.
66. How To Format The Data in Your Pandas DataFrame?
Ans:
Most of the times, you will also want to be able to do some operations on the actual values that are in your DataFrame.
Keep on reading to find out what the most common Pandas questions are when it comes to formatting your DataFrame’s values!
Replacing All Occurrences of a String in a DataFrame
To replace certain Strings in your DataFrame, you can easily use replace(): pass the values that you would like to change, followed by the values you want to replace them by.
Note that there is also a regex argument that can help you out tremendously when you’re faced with strange string combinations. In short, replace() is mostly what you need to deal with when you want to replace values or strings in your DataFrame by others.
Removing Parts From Strings in the Cells of Your DataFrame
Removing unwanted parts of strings is cumbersome work. Luckily, there is a solution in place! You use map() on the column result to apply the lambda function over each element or element-wise of the column. The function in itself takes the string value and strips the + or — that’s located on the left, and also strips away any of the six aAbBcC on the right.
Splitting Text in a Column into Multiple Rows in a DataFrame
Splitting your text into multiple rows is quite complex. For a complete walkthrough, go here.
Applying A Function to Your Pandas DataFrame’s Columns or Rows
You might want to adjust the data in your DataFrame by applying a function to it. Go to this page for the code chunks that explain how to apply a function to a DataFrame.
67. How To Add an Index, Row or Column to a Pandas DataFrame?
Ans: Now that you have learned how to select a value from a DataFrame, it’s time to get to the real work and add an index, row or column to it!
Adding an Index to a DataFrame
When you create a DataFrame, you have the option to add input to the ‘index’ argument to make sure that you have the index that you desire. When you don’t specify this, your DataFrame will have, by default, a numerically valued index that starts with 0 and continues until the last row of your DataFrame.
However, even when your index is specified for you automatically, you still have the power to re-use one of your columns and make it your index. You can easily do this by calling set_index() on your DataFrame.
Adding Rows to a DataFrame
Before you can get to the solution, it’s first a good idea to grasp the concept of loc and how it differs from other indexing attributes such as .iloc and .ix:
- loc works on labels of your index. This means that if you give in loc[2], you look for the values of your DataFrame that have an index labeled 2.
- iloc works on the positions in your index. This means that if you give in iloc[2], you look for the values of your DataFrame that are at index ’2`.
- ix is a more complex case: when the index is integer-based, you pass a label to ix. ix[2] then means that you’re looking in your DataFrame for values that have an index labeled 2. This is just like loc! However, if your index is not solely integer-based, ix will work with positions, just like iloc.
Now that the difference between iloc, loc and ix is clear, you are ready to give adding rows to your DataFrame a go!
As a consequence of what has just been explained, you understand that the general recommendation is that you use .loc to insert rows in your DataFrame.
If you would use df.ix[], you might try to reference a numerically valued index with the index value and accidentally overwrite an existing row of your DataFrame.
You better avoid this!
Adding a Column to Your DataFrame
In some cases, you want to make your index part of your DataFrame. You can easily do this by taking a column from your DataFrame or by referring to a column that you haven’t made yet and assigning it to the .index property.
However, if you want to append columns to your DataFrame, you could also follow the same approach as adding an index to your DataFrame: you use loc or iloc.
Note that the observation that was made earlier about loc still stays valid also for when you’re adding columns to your DataFrame!
Resetting the Index of Your DataFrame
When your index doesn’t look entirely the way you want it to, you can opt to reset it. This can easily ben done with .reset_index().
68. How To Select an Index or Column From a Pandas DataFrame?
Before you start with adding, deleting and renaming the components of your DataFrame, you first need to know how you can select these elements.
So, how do you do this?
Well, in essence, selecting an index, column or value from your DataFrame isn’t that hard. It’s really very similar to what you see in other languages that are used for data analysis (and which you might already know!).
Let’s take R for example. You use the [,] notation to access the data frame’s values. In Pandas DataFrames, this is not too much different: the most important constructions to use are, without a doubt, loc and iloc. The subtle differences between these two will be discussed in the next sections. For now, it suffices to know that you can either access the values by calling them by their label or by their position in the index or column.
69. How will you get the average of values of a column in pandas DataFrame?
Ans;
Steps to get the Average for each Column and Row in Pandas DataFrame
Step 1: Gather the data
To start, gather the data that needs to be averaged.
For example, I gathered the following data about the commission earned by 3 employees (over the first 6 months of the year):
The goal is to get the average of the commission earned:
- For each employee over the first 6 months (average by column)
- For each month across all employees (average by row)
Step 2: Create the DataFrame
Next, create the DataFrame in order to capture the above data in Python:
import pandas as pd
data = {'Month': ['Jan ','Feb ','Mar ','Apr ','May ','Jun '],
'Jon Commission': [7000,5500,6000,4500,8000,6000],
'Maria Commission': [10000,7500,6500,6000,9000,8500],
'Olivia Commission': [3000,6000,4500,4500,4000,5500],
}
df = pd.DataFrame(data,columns=['Month','Jon Commission','Maria Commission','Olivia Commission'])
print (df)
Run the code in Python, and you’ll get the following DataFrame:
Step 3: Get the Average for each Column and Row in Pandas DataFrame
You can then apply the following syntax to get the average for each column:
df.mean(axis=0)
For our example, this is the complete Python code to get the average commission earned for each employee over the 6 first months (average by column):
import pandas as pd
data = {'Month': ['Jan ','Feb ','Mar ','Apr ','May ','Jun '],
'Jon Commission': [7000,5500,6000,4500,8000,6000],
'Maria Commission': [10000,7500,6500,6000,9000,8500],
'Olivia Commission': [3000,6000,4500,4500,4000,5500]
}
df = pd.DataFrame(data,columns=['Month','Jon Commission','Maria Commission','Olivia Commission'])
av_column = df.mean(axis=0)
print (av_column)
Run the code, and you’ll get the average commission per employee:
Alternatively, you can get the average for each row using the following syntax:
df.mean(axis=1)
Here is the code that you can use to get the average commission earned for each month across all employees (average by row):
import pandas as pd
data = {'Month': ['Jan ','Feb ','Mar ','Apr ','May ','Jun '],
'Jon Commission': [7000,5500,6000,4500,8000,6000],
'Maria Commission': [10000,7500,6500,6000,9000,8500],
'Olivia Commission': [3000,6000,4500,4500,4000,5500],
}
df = pd.DataFrame(data,columns=['Month','Jon Commission','Maria Commission','Olivia Commission'], index =['Jan ','Feb ','Mar ','Apr ','May ','Jun '])
av_row = df.mean(axis=1)
print (av_row)
Once you run the code in Python, you’ll get the average commission earned per month:
You may also want to check the following source that explains the steps to get the sum for each column and row in pandas DataFrame.
70. How to Apply function to every row in a Pandas DataFrame?
Ans: Python is a great language for performing data analysis tasks. It provides with a huge amount of Classes and function which help in analyzing and manipulating data in an easier way.
One can use apply()
function in order to apply function to every row in given dataframe. Let’s see the ways we can do this task.
Example
import pandas as pd
def add(a, b, c):
return a + b + c
def main():
data = {
'A' :[ 1 , 2 , 3 ],
'B' :[ 4 , 5 , 6 ],
'C' :[ 7 , 8 , 9 ] }
df = pd.DataFrame(data)
print ( "Original DataFrame:\n" , df)
df[ 'add' ] = df. apply ( lambda row : add(row[ 'A' ],
row[ 'B' ], row[ 'C' ]), axis = 1 )
print ( '\nAfter Applying Function: ' )
print (df)
if __name__ = = '__main__' :
main()
|
Output: