In this step by step guide, we will walk through the python pandas libraries and along with that we also covering all the necessary pandas built-in functions, working with data frames and series using pandas package. Let’s Dive In !
Pandas
Statistical data acquired from different sources is in a raw
form that has to be cleaned and prepared for further analysis.
Pandas is a library for data preparation and cleaning. Pandas
uses mainly two data structures.
- Series — which is like a list of items.
- Data frames — which acts like a table or a matrix with multiple columns.
Pandas allows data cleaning features such as replacing missing values, joining and merging data from different sources. We import Pandas and use pd as its alias.
Series — Similar to a 1-dimensional NumPy array, the Series data structure is able to handle 1-dimensional data. However, unlike NumPy arrays it offers extra features to pre-process data. The constructor or function Series () is used to create a series object.
The last line of the output dtype: int64 indicates that the type
of values of my series is an integer of 64 bits. A series object contains two arrays: index and value that are linked to each other. The first array on the left of the output
of the previous program saves the index of the data while the
second array on the right contains the actual values of the series. Series objects can be generated using NumPy arrays. Instead
of default numeric indices descriptive indices can be assigned
Line 12 uses the index option in pd.Series () constructor to
assign letters as indices to the values.
DataFrame — The second data structure used by Pandas that is
DataFrame, it is similar to a 2-dimensional NumPy array. The
DataFrame contains an ordered group of columns. Every
column contains values of numeric, string or Boolean
types.
To create a DataFrame, we pass a dictionary to the constructor DataFrame(). This dictionary comprises keys with corresponding values. In the Python code given below, a
dictionary, namely d, is created. This dictionary is used as an input to the DataFrame () constructor to create a dataframe.
This Python code makes use of a dictionary object to create a dataframe namely df. The program also shows how to get data from the dataframe.
Conclusion
In this step by guide, we will taking a test drive of pandas library and after using it we realize how beneficial it is for us, when we handle large volume data and by this package we can easily manipulate our data by using its outclass built-in functions.