💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook
July 25, 2023
5m 40s

TUTORIAL: How to do data analyses with ChatGPT's Code Interpreter?

Transcript

ChatGPT's Code Interpreter offers an easy way to start analyzing data. No Python knowledge required, simply ask questions about the data and ChatGPT will analyse the dataset for you. In this video, Tobi Hann, MOSTLY AI's CEO demos how powerful this feature is using the synthetic version of the Income dataset from the American Community Survey. To protect data privacy, only upload synthetic versions of your datasets to ChatGPT! #chatgpt #codeinterpreter #syntheticdata #dataanalysis #python #dataprivacy #chatgptdataprotection #dataprotection

Transcript

[00:00:01] Hi, today I want to show you a really cool new feature that ChatGPT2 introduced called Code Interpreter.

[00:00:06] Actually, we're going to use Code Interpreter not to interpret code, but data. Let's go into that.

[00:00:20] In order to enable that feature, you need to actually go here into your settings.

[00:00:29] There, you go to Beta features and there you see Code Interpreter and enable that. You have to be a Plus user to do that.

[00:00:38] If you're not a Plus user, yet, I highly recommend getting an account.

[00:00:43] Once this is enabled, what you can then see is if you go in here and select GPT-4,

[00:00:52] You need to select GPT-4. You can select here Code Interpreter.

[00:00:59] Now what's possible is you can actually upload some data. You can upload a file, and I think you can upload up to 100 megabytes of data. Let's do that.

[00:01:07] I have here the Income dataset we're using. It's actually a synthetic version of our Income dataset.

[00:01:18] I just synthesized it with our platform. Now, this file has been uploaded.

[00:01:24] Now actually, I can work with that file and I can ask questions. I'm going to ask ChatGPT to describe the data in the uploaded file and to create some insights and charts.

[00:01:38] What's happening now is that ChatGPT is actually doing some pretty cool work.

[00:01:47] We see here, it detected that it's a data set from the American Community Survey, and it contains information about individuals' income and related demographics in 2018. It gives me a description of the columns. Actually pretty cool.

[00:02:08] Then it says it will perform initial exploratory analysis to provide insights and create charts. We see the data consists of 9,999 rows, 32 columns, most of the features are categorical.

[00:02:24] It gives me now some data here around some of the data here.

[00:02:27] Income, middle income 100, maximum income 820,000, average income, 65,000, median income, 44,000.

[00:02:35] Age, vehicle occupancy, travel time to work, ranging from one minute to 161 minutes. On average it's 26.73 minutes. Also some insights like this, 75% of individuals have a travel time to work of 35 minutes or less.

[00:02:53] What's pretty cool is it actually does all of this with Python. You can see here in the code section, you can actually see the code that was used to create those insights.

[00:03:13] Data.info, data.describe, and you see the output here. You can copy this code and you can actually then put this into your own Jupyter Notebook and start modifying that, expanding that. It really gives you a headstart when you want to analyze some data.

[00:03:31] For example, in one of the charts, we see your income distribution, and it's a skewed income distribution. ChatGPT tells me I have heavily right-skewed meaning most people are less than the average income, and a small number of people earn significantly more. The 80 000 over here, majority over here.

[00:03:52] Age distribution. We see here the age distribution and some more charts that the top 10 categories for education, bachelor's degree, high school diploma, and so forth. This is all done automatically.

[00:04:06] Again, I can go here and get the Python codes that's used to create those charts. That's pretty cool. I can copy that and I can then use it. I can actually also start asking more specific questions. For example, if I'm interested in is there an income difference between males and females?

[00:04:37] Now ChatGPT will answer that question. There we have a chart that shows me the box plots, distribution of income by gender and we see yes, that there is a difference.

[00:04:53] The mean income from males is approximately 76,000,

[00:04:56] whereas for females it's only 53,000.

[00:05:01] It even gives me some additional information.

[00:05:05] Isn't that awesome?

[00:05:08] Very, very cool ChatGPT Code Interpreter.

[00:05:13] I think also a fantastic use case for synthetic data because,

[00:05:17] yes, you don't want to upload sensitive data to ChatGPT even if you disable

[00:05:24] the history functionality.

[00:05:27] It's probably going to be a good idea and best practice to upload

[00:05:31] some synthetic data to keep it safe.

[00:05:34] Have fun exploring that, and see you soon.

[00:05:38] Bye.

Ready to try synthetic data generation?

The best way to learn about synthetic data is to experiment with synthetic data generation. Try it for free or get in touch with our sales team for a demo.
magnifiercross