ChatGPT on Exploratory Data Analysis
ChatGPT has recently gained considerable attention, with widespread conjecture suggesting it may soon automate numerous tasks, potentially including those of data analysts. Many research papers are being published that highlight the impressive productivity of ChatGPT in comparison to data analysts, presenting quite compelling arguments. Yet, as a data analyst myself with natural curiosity demands firsthand experience to understand whether my role could genuinely be replaced by such technology. Before we dive in, here is the quick summary table.
This summary table certainly does not capture the entirety of EDA, like categorical data analysis is omitted here, but I think it provides a good understanding on what can and cannot be achieved.
If you know the enemy and know yourself, you need not fear the result of a hundred battles
- Art of War
It appears that ChatGPT embodies a sense of humility and acknowledging its own limitations. It learns rapidly through iterations and displays discipline in its approach. Interestingly, it tries to lower my mental guard by telling me that my job will not likely be replaced, if in any way had I have slightest believe in its statement, shame on me. Because by the time I am replaced, I will have no leaverage but out of the game. You cann’t fool me!
Benchmark
In the past, I wrote an article on Exploratory Data Analysis (EDA), which I now intend to use as a yardstick to assess the capabilities of ChatGPT. My hypothesis is that if ChatGPT can reproduce all aspects in my initial article, it would suggest that I am very much close to be replaced.
https://medium.com/mlearning-ai/a-practical-guide-to-exploratory-data-analysis-fabbac2bcad
My article was segmented into data structure, graphical illustration of feature correlations, and outliers. Let’s test each of these sections individually using ChatGPT, and show you what is the good, the bad and the ugly.
Data Structure
DataFrame
I was not expecting the results to be so wordy, rather I expected to have a dataframe with 7 rows similar to syntax fraud0.head(). Because of this, I am going to give it Bad
Descriptive Analysis
Again, I was expecting to see a nice well structured table, but this time not only did it not show me the dataframe, it completely omitted the core part, ie descriptive stats. This is Ugly
Other simply descriptive statistics are accurate, although not eye pleasing it gets the job done. But this is too rudimentary and I will say it is bad
Data Manipulation
Feature creation
This is exactly how I imagined, I specify the requirements and GPT delivers, this is good!
Time Manipulation
The worst headache when working with any programming language is the time, it is hard to deal with them because each language has its own set of rules. But ChatGPT just made it look like a breeze! This is easily good!
Data Type Conversion
Simple Conversion
Simple conversion like convert numeric to string is easy, and GPT can do a really nice job. So I will give it good.
Time Conversion
A simple conversion is doable with GPT, but it is still having hard time to convert time in string to time format. I guess I will give this a bad
Feature Relations
Correlations
If you just want a quick know on the correlation between 2 variables, then GPT will help you get the answer, but anything more than 2 will not. So this is bad
Univariate Plots
Undeniably, one of the strongest points of ChatGPT is its proficiency in generating charts. Gone are the days of crafting verbose Python code. Now, all you need to do is instruct ChatGPT regarding your requirements. This is Good
Multivariate Plots
Again, This looks amazing. Good
Conclusion
While it may currently be challenging for ChatGPT to entirely substitute a data analyst, given its limitations in performing certain tasks and the need for supervision in others, it undoubtedly holds potential to greatly enhance productivity for those adept at utilizing its capabilities.