Module 7 — Python Libraries For Statistics

A.I Hub
3 min readJul 30, 2023

--

Image by Author

In this step by step guide, we will walk through the python pandas libraries and along with that we also covering all the necessary pandas built-in functions, working with data frames and series using pandas package. Let’s Dive In !

Pandas

Statistical data acquired from different sources is in a raw

form that has to be cleaned and prepared for further analysis.

Pandas is a library for data preparation and cleaning. Pandas

uses mainly two data structures.

  • Series — which is like a list of items.
  • Data frames — which acts like a table or a matrix with multiple columns.

Pandas allows data cleaning features such as replacing missing values, joining and merging data from different sources. We import Pandas and use pd as its alias.

Series — Similar to a 1-dimensional NumPy array, the Series data structure is able to handle 1-dimensional data. However, unlike NumPy arrays it offers extra features to pre-process data. The constructor or function Series () is used to create a series object.

The last line of the output dtype: int64 indicates that the type

of values of my series is an integer of 64 bits. A series object contains two arrays: index and value that are linked to each other. The first array on the left of the output

of the previous program saves the index of the data while the

second array on the right contains the actual values of the series. Series objects can be generated using NumPy arrays. Instead

of default numeric indices descriptive indices can be assigned

Line 12 uses the index option in pd.Series () constructor to

assign letters as indices to the values.

DataFrame — The second data structure used by Pandas that is

DataFrame, it is similar to a 2-dimensional NumPy array. The

DataFrame contains an ordered group of columns. Every

column contains values of numeric, string or Boolean
types.

To create a DataFrame, we pass a dictionary to the constructor DataFrame(). This dictionary comprises keys with corresponding values. In the Python code given below, a

dictionary, namely d, is created. This dictionary is used as an input to the DataFrame () constructor to create a dataframe.

This Python code makes use of a dictionary object to create a dataframe namely df. The program also shows how to get data from the dataframe.

Conclusion

In this step by guide, we will taking a test drive of pandas library and after using it we realize how beneficial it is for us, when we handle large volume data and by this package we can easily manipulate our data by using its outclass built-in functions.

--

--

A.I Hub
A.I Hub

Written by A.I Hub

We writes about Data Science | Software Development | Machine Learning | Artificial Intelligence | Ethical Hacking and much more. Unleash your potential with us

No responses yet