ChatGPT on Exploratory Data Analysis

The Good, The Bad and The Ugly

Xiao
5 min readJun 7, 2023

ChatGPT has recently gained considerable attention, with widespread conjecture suggesting it may soon automate numerous tasks, potentially including those of data analysts. Many research papers are being published that highlight the impressive productivity of ChatGPT in comparison to data analysts, presenting quite compelling arguments. Yet, as a data analyst myself with natural curiosity demands firsthand experience to understand whether my role could genuinely be replaced by such technology. Before we dive in, here is the quick summary table.

This summary table certainly does not capture the entirety of EDA, like categorical data analysis is omitted here, but I think it provides a good understanding on what can and cannot be achieved.

If you know the enemy and know yourself, you need not fear the result of a hundred battles

- Art of War

It appears that ChatGPT embodies a sense of humility and acknowledging its own limitations. It learns rapidly through iterations and displays discipline in its approach. Interestingly, it tries to lower my mental guard by telling me that my job will not likely be replaced, if in any way had I have slightest believe in its statement, shame on me. Because by the time I am replaced, I will have no leaverage but out of the game. You cann’t fool me!

Benchmark

In the past, I wrote an article on Exploratory Data Analysis (EDA), which I now intend to use as a yardstick to assess the capabilities of ChatGPT. My hypothesis is that if ChatGPT can reproduce all aspects in my initial article, it would suggest that I am very much close to be replaced.

https://medium.com/mlearning-ai/a-practical-guide-to-exploratory-data-analysis-fabbac2bcad

My article was segmented into data structure, graphical illustration of feature correlations, and outliers. Let’s test each of these sections individually using ChatGPT, and show you what is the good, the bad and the ugly.

Data Structure

DataFrame

I was not expecting the results to be so wordy, rather I expected to have a dataframe with 7 rows similar to syntax fraud0.head(). Because of this, I am going to give it Bad

Descriptive Analysis

Again, I was expecting to see a nice well structured table, but this time not only did it not show me the dataframe, it completely omitted the core part, ie descriptive stats. This is Ugly

Other simply descriptive statistics are accurate, although not eye pleasing it gets the job done. But this is too rudimentary and I will say it is bad

Data Manipulation

Feature creation

This is exactly how I imagined, I specify the requirements and GPT delivers, this is good!

Time Manipulation

The worst headache when working with any programming language is the time, it is hard to deal with them because each language has its own set of rules. But ChatGPT just made it look like a breeze! This is easily good!

Data Type Conversion

Simple Conversion

Simple conversion like convert numeric to string is easy, and GPT can do a really nice job. So I will give it good.

Time Conversion

A simple conversion is doable with GPT, but it is still having hard time to convert time in string to time format. I guess I will give this a bad

Feature Relations

Correlations

If you just want a quick know on the correlation between 2 variables, then GPT will help you get the answer, but anything more than 2 will not. So this is bad

Univariate Plots

Undeniably, one of the strongest points of ChatGPT is its proficiency in generating charts. Gone are the days of crafting verbose Python code. Now, all you need to do is instruct ChatGPT regarding your requirements. This is Good

Multivariate Plots

Again, This looks amazing. Good

Conclusion

While it may currently be challenging for ChatGPT to entirely substitute a data analyst, given its limitations in performing certain tasks and the need for supervision in others, it undoubtedly holds potential to greatly enhance productivity for those adept at utilizing its capabilities.

--

--