Trying to Upload Excel in Python in Jupyter Notebook

abril 16, 2022 Postar um comentário

In this tutorial we will larn to write a unproblematic Python script for reading data files.

For this tutorial I am going to assume that y'all take some idea most using either Jupyter notebook or Python in full general. I also assume that you take Anaconda installed, or know how to install packages into Python. If yous practise not, then I would kickoff suggest putting a few minutes aside for installing Anaconda and taking a crash course in Jupyter.

We are reading this data file

Here is a snapshot of what my data file looks like:

It is a 2-column comma separated value (CSV) file. The wavelength (10-axis) is in the first cavalcade and light intensity (y-axis) measured at that wavelength is in the second column. It is intensity vs. wavelength here, this may be something else for you – population vs. twelvemonth, temperature vs. month, or sepal width vs. sepal length etc. You can endeavor following tutorial this with the iris dataset. (verbal solution at the stop)

The excel snapshot of data shown here is for representation purposes. In reality, the data columns are 1000s of rows long. The point is, you may cull to plot such data as a scatter plot, or a line plot.

Importing the relevant libraries

Reading a data file into a Python Jupyter notebook is unproblematic. When you lot install, information technology comes with a version of Python that has the Pandas library pre-installed in it.

Kickoff your Jupyter notebook and type in the post-obit in your jail cell.

import pandas as pd

This imports the module pandas and all of the useful functions inside of it can be used using the "pd." prefix. Similarly, the other scientific computing favorite, "numpy" is unremarkably imported equally "np" and you lot do it exactly what you did with pandas. Add the following to the next line of your code:

import numpy as np

Pandas (pd) is a neat library for handling big datasets but in this lesson we volition employ only a small part of it to read our data file. Numpy (np) is a library that makes life easier when handling arrays, hence nosotros import that too. We will meet in the following code how these will be used.

Running your code at this point will appear like it did zilch. But a lot of stuff happens behind the scenes. The libraries are imported into your code. Y'all will certainly see errors if the libraries you try to import were not installed properly.

Filename string

To read a file, your code needs to be told its location. The location, a cord type variable, can be stored in a variable such equally "filename." And the location will be its path in your computer. A string in python is surrounded by either single quotation marks, or double quotation marks.

Say your file is in the same binder as your Jupyter notebook and is chosen "P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv". In this case you lot volition first starting time by storing your file name in an arbitary variable, permit'south say "filename", like this:

filename = 'P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv'

Specifying a path

Merely since files are not always conveniently inside the aforementioned folder as your lawmaking, you can too have the full path of the file stored in this string. This is e'er better. So at present, say your file is in a folder 'D:/information/P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv', then yous store your full path+filename inside the filename variable as follows:

filename = 'D:/data/P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv'

Note that the slash in the path is a forwards slash, unlike the backslash windows often uses. Using backslashes will not piece of work hither. If you lot still want to use backlashes, y'all will have to replace a single backslash with a double one as follows:

filename = 'D:\\data\\P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv'

This is because the backslash is a reserved character (or an illegal character) in Python. It is used to print special characters such as new line or a tab, using '\n' or '\t'. So when you apply a unmarried backfire, python doesn't read information technology as a graphic symbol. You demand to utilize an escape character "\\" for Python to read it equally a single backslash. Other useful escape characters can exist establish here.

Nosotros will learn smarter ways to get the path more easily by browsing files in the file explorer window, in the post-obit lessons.

Reading the file

To read the file using our imported pandas library now all you have to practise is utilize the "read_csv" function from pandas every bit follows:

pd.read_csv(filename)

Every bit long every bit you have a file with the column like data (shown previously) in it, you lot will immediately get a table every bit the output which for the blazon of information I showed above, would await similar this:

If this shows alright it means that your information file reading was successful. There are certain problems with the header, we volition ready that later. To shop this pandas object in a variable called "data", for instance, we will modify our line of code to assign the pandas object to 'data'.

information = pd.read_csv(filename)

Another great way of checking if your data was read in the columns separated accordingly, you can utilise the head role to display merely a quick preview of the data as follows. Since you take assigned the object to the 'information' variable already, yous could brandish the head part of the data using the following:

data.head()

Fob to reading data in ASCII files

In case you want your plan to be robust plenty to read most ASCII files with arbitrary delimiters using a unmarried line of lawmaking, I've found using the post-obit to be very useful. This line of lawmaking should be good enough to read all tab separated, infinite separated, or other delimiter separated files.

data = pd.read_csv(filename, sep=None, engine='python')

Adjusting the header

Since our data has no header, we can add an attribute to the read_csv function to tell it that in that location's no header in the file. To do this we use the 'header' attribute with value None. Permit'south alter our line of code to look as follows:

data = pd.read_csv(filename, sep=None, engine='python', header=None) data.head()

This volition bear witness you the data where the header names of the columns are simply 0 and i. This is what we want.

If your file does have text headers (iris dataset), then you probably wouldn't need to practise this.

Obtaining raw data array

To obtain the raw 2D array, out of this pandas information object, you lot can utilise the value role on the object like this:

rawdata = data.values print(rawdata)

and the output will bear witness the raw 2D array in numbers.

Pandas does a bully job of reading large files fast. I use it mostly just for that. From here onward, I find it easier to do all of the data processing on the raw 2D arrays.

Obtaining each column assortment

Say your 10-axis data is in the first column, to get all of the rows from the starting time column into a variable 'x', nosotros attach the [:,0] to the rawdata variable. This tells Python to fetch all rows (:) from the 0th column. To get the second column yous would use [:,1]. It is of import to remeber that indexing always starts from 0 in Python.

And so at present, our lawmaking written all together looks as follows:

import pandas equally pd import numpy equally np  filename = 'information/L1/P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv' data = pd.read_csv(filename, sep=None, engine='python', header=None) rawdata = information.values x = rawdata[:,0] y = rawdata[:,1]  impress("All rows of column 0: ", x) print("All rows of cavalcade ane: ", y)

The output for this gives me the individual arrays to exist plotted, or do analysis on. The output should now look equally follows:

Specific rows, say the part of data from row 10 to 100 of column one, tin be extracted as follows:

y_cut = rawdata[10:100,1]

Plotting our data

To have something to show for the work we just put in, let's plot our data. Ane of the easiest libraries to piece of work with while plotting data in Python is matplotlib. Information technology is highly customization and can practise wonders inside a few lines of lawmaking.

We can plot the ten and y variables past importing a part of the library (matplotlib.pyplot), using the plot function and showing it. Make sure the number of rows for both your columns are the same. So if you slice the rows (as in previous step) for x, do the same slicing for y as well.

We will add the following three lines of code to your script to plot the data:

import matplotlib.pyplot as plt plt.plot(x,y) plt.bear witness()

The first line here, imports the matplotlib.pyplot library as 'plt'. This 'plt' is arbitary. You can use anything instead of those three letters and use the same letters when y'all are trying to call functions from that library. Many people like to use 'plt' to do this. So I'one thousand using that.

Although I am importing the library here, I want to say that you should effort doing all of these imports together in the commencement department of your script.

The 2nd line plots the line plot with x assortment in the x-centrality and the y array in the y-centrality. All of this is done by Python and stored in the retentiveness. It won't prove / print till you blazon in the third line. Your output should show the following plot if you lot used the same data file I did.

For the reading information in the Iris dataset

Since the iris dataset has headers, I only removed the header=None attribute from read_csv. And I replaced the plot function with a scatter function to depict a scatter plot instead of a line plot (which was a mess in this case). With these ii modifications the script worked like a charm. Here'south the code I used to plot the sepal_width (y-axis – column number 1) vs. the sepal_length (x-axis – cavalcade number 0) from the iris dataset.

import pandas as pd import numpy as np import matplotlib.pyplot as plt  filename = 'data/L1/iris_dataset.csv' data = pd.read_csv(filename, sep=None, engine='python') rawdata = data.values 10 = rawdata[:,0] y = rawdata[:,1]  impress("All rows of cavalcade 0: ", 10) print("All rows of column 1: ", y)  plt.besprinkle(ten,y) plt.show()

The output looked as follows:

We will do much more than than reading information and simple plotting. Later we will learn to alter our plots, brand our file reading method more efficient in the side by side few lessons. Just for at present, I hope the plot gives y'all a sense of reward.

Let me know in the comments or the contact page if I did non explain a certain part of the code too conspicuously. I volition be happy to add together more than clarity to the lesson.

burkealarly.blogspot.com

Source: https://edusecrets.com/lesson-01-reading-data-files-in-python-jupyter-notebook/

Burke Alarly