Lately, there have been some efforts to incorporate machine learning in experimental measurements, which are generally quite known in the community, and especially the quantum one (see here for example). While these types of work are currently ‘hot’, I decided to do a small post here about the small cousin of ML, which is automation. That is: Extracting information from large datasets of experiments.

This came about from my recently published work done at Grenoble, in which I had the chance to work with a large number of well-organized experiments. And I think it goes nicely with my previous post which is about automation in materials simulation.

Here, instead, I will present some common methods of extracting pinch-off voltages using Python. I did a previous post on a similar subject. Together they can be quite handy for extracting information fast from 1D data. Of course, they can be generalized for 2D also, but the here we focus on device measurements and not spectroscopy. In fact, for the 2D plots I analysed, I handled them as a list of 1D data, so I applied immediately similar routines, instead of 2D ones.

As always, this is a simple tutorial that only requires minimal knowledge of derivatives. At the end of the post, I will also give you the paper where this is practiced, as well as data of lots of measurements on quantum point contacts (QPC) where you can practise.

But let’s see this in examples:

## Interpolation

First, we need to look into the type of data that we have. For the simple cases numpy or scipy interpolation will suffice. Taking a look at the list of cases, you can maybe locate your type of data. This was sufficient for example, for the pinch-off voltages of 1D current-voltage data (figure below). The new way of locating the ‘bumps’ here is that I used a constant value for the percentage below the peak of the dI/dV peak point (If you don’t remember the rules, plenty of resources to help you). But maybe reading carefully you can find an even more accurate way. The equivalent piece of code is:

`f = interpolate.interp1d(I, V, fill_value="extrapolate", bounds_error=False, kind="linear")`

where you can tweak the settings to get other types of interpolation like cubic etc., and of course, extrapolate if the value that we seek to find is outside the given data range.

## Linear Regression

The same work can be done with linear regression. Slightly more heavy, but I found it worked better for the same problem described above.

For various reasons, interpolation (extrapolation) might not work at all. For example, 2D data, when decomposed into a list of 1D data, then some of the plots may not reach zero current. Take a look at the figure below, where the experimentalists were biasing one gate of a quantum point contact (V_{top}) at different values of the second gate voltage (V_{bot}). Here, at higher V_{bot}, V_{top} does not get to fully deplete the channel. For this, it also helps to cut out parts of the data, setting limits in either x or y values.

For the 2D, the full data are shown below. You can see that it does not even make sense to search at higher voltages.

For linear regression, you can also also choose between linear and polynomial regression. For example, take a look at this data below, which are taken from 3D simulations. The behaviour of the charge density (at a point or integrated) as a function of gate voltage differs under the gate from under the surface between two gates in a QPC. While simple interpolation may work for the orange curve, for the blue curve, I could get nicer results with polynomial regression.

This now has passed to the realm of supervised learning, so you can find more ideas from the scikit-learn website. If you are a student or someone who has a bit of time, you can try out different techniques, it’s quite a good practise.

That being said, it does not mean that interpolation does not work nicely with the right kind of data or the right kind of handling. Take a look at the plot below where I locate the pinch-off voltages in 2D current-voltage data using interp1d from scipy.interpolate after cutting out the first N number of points in the array for each V_{bot}. (The color of each dashed line corresponds to the color of the solid line). Quite neat, hah?

As I promised, you can find the dataset on zenodo here, while if you are using, please don’t forge to reference the work that created the data:

E. Chatzikyriakou, J. Wang, L. Mazzella, A. Lacerda-Santos, M. Cecilia da Silva Figueira, A. Trellaxis, S. Birner, T. Grange, C. Bäuerle, X. Waintal, “*Unveiling the charge distribution of a GaAs-based nanoelectronic device: A large experimental data-set approach”, Phys. Rev. Res., Dec 2022, doi: 10.1103/PhysRevResearch.4.043163*

This project received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie **grant agreement No**: 840550.