ChatGPT (3.5) | Up to the Challenge, Only if AI Can Get Over Numerical and Gender-based Biases – AI and Film 2024 • FILM 485 • PROFESSOR GREGORY ZINMAN

When approaching any attempts to have an Artificial Intelligence source create a piece of script, an amount of manipulation to any AI system is required. Fundamentally, just giving an input of non-ordered words with no indicating pronouns, propositions, or other grammar functions, will not end up with the desired result. Fundamentally, the biggest reoccurring issue that I encountered had to do with the input itself, but includes a combination of how the Artificial Intelligence, in this case ChatGPT (3.5) functions. In the previous version of this overarching project of attempting to catalyze some form of creativity in this free ChatGPT prototype, a common issue that appeared in all the published work by the Artificial Intelligence. This so-called ‘spark of creativity’ seemed to be not only absent but actively diminishing the published work. While of course this is not only found in this form of Artificial Intelligence, or arguably solely AI, as there are thousands upon thousands of films that are never screened beyond an opening weekend, the nature of the clear formula-based search seemed to be at the forefront of the writing. While the issue of formulaic, vague writing with weak, unidimensional characters seemed to be at its most obvious in a ChatGPT input of a sole genre, take a Noir or Epic film, this was still clearly delineated in the input of a director as well. The supposed capabilities of an AI-based script writing tool seem to be at their worst when the input is simply a genre. The biggest conclusion from any previous attempts to spawn an original script from this Artificial Intelligence has ultimately been not only dis-heartening, but serves as a piece of evidence for AI’s inability to produce content worthy of a widescreen release; is there a solution to this clear lack of creative resources is what this subsequent analysis hopes prove, at the very least theoretically.

In regards to these new attempts at ChatGPT to create a plausibly original script, the issue that seemed to be at the forefront of ChatGPT’s previously given scripts, namely the given input of PTA (Paul Thomas Anderson), is a beyond evident utilisation of whatever is the most popular result online, in regards to that name. The formulaic nature of ChatGPT’s previous work was the most apparent issue, but this may be due to the nature of ChatGPT’s conception. While there have been countless articles written about how ChatGPT functions, its undoubtedly clear that the code itself depends on the given formula of a LLM, Large Language Model (Frieder, Pinchetti, Chevalier, et. al.; 2023). The formula itself, unsurprisingly, is based upon the highest numerical number of whatever data has been called-upon by the input code. The most obvious example from the previous work could be seen with final choice of Paul Thomas Anderson as the AI-generated author of the script which subsequently resulted in both characters and a narrative which strikingly resembles his most recent release, Licorice Pizza (2021). This, upon a first viewing, may seem rather odd to a deep-seated cinephile who would expect his more widely praised work like There Will be Blood (2007) or The Master (2012) to be utilised, simply based on their seemingly exponential growth in popularity and critical status following their release. Still, Licorice Pizza (2021) was the clear main body of work which ChatGPT based its script-writing decisions off of. This, fundamentally, ties within the limiting boundaries of what ChatGPT can function as, as a combined University of Cambridge. Oxford, and Vienna study proves with their extrapolation on ChatGPT’s current systems analysis capabilities.

“Regarding LLMs (Language Learning Models), recent ones, for instance, PaLM [24] (released in 2022), are tested only on elementary-level mathematical reasoning datasets, such as the MathQA or GSM8K datasets [25, 26]. We hypothesize that this is due to a lack of advanced-level natural language mathematics datasets” (Frieder, Pinchetti, Chevalier, et. al.; 2023).

The conclusion presented above may seem as an automatic detriment to the creative-based capabilities of ChatGPT but a lack of natural language mathematics datasets may have the inverse response to what most humans inclined to creativity, which I hope includes myself, expect. The inclusion of advanced-level natural language mathematics sets is undoubtedly a highly vague conclusion, but when put into context, its effects on the engine of a ChatGPT, and thus their ability to not be as rigidly formulaic in their outputs seems hopeful with this step still to come. According to this combined Vienna, Cambridge, and Oxford study, the datasets which the ChatGPT being utilised in this study can read, are not up to par with the most contemporary data analytics software as the previously mentioned PaLM study, a dataset with 540 billion parameters was not able to be entirely processed by ChatGPT which only correctly solved 54% of the data. While a dataset with 540 billion parameters in undoubtedly unimaginable, when put into the context of the endlessly increasing size of the internet’s key search engines and its published articles, videos, books, opinion-pieces, etc., this inability to read the entire dataset can serve as a direct drawback to the quality of ChatGPT’s output. Therefore, while it may seem beyond archaic to state that such a mammoth of a dataset’s inability to be read entirely by ChatGPT, this inability to read such a dataset is directly transferred to ChatGPT’s capabilities in finding and creating content which not only matches but builds upon the given input. Thus, such a response for Licorice Pizza (2021) may at first seem like an inability to utilise a plethora of data to create an ‘original’ script but may more adequately just be an indication of the prenatal nature of ChatGPT (3.5), at the moment.

In order to get beyond such a discrepancy of Chat GPT’s current LLM script, to create a script more alike the former definition than the latter, the input, the given director, genre, setting, etc., must not follow easily researchable topics, genres, or people. To incentivize ChatGPT to not utilise purely the law of large numbers combined with LLMs to make an output which seems to follow popular trends rather than a narrative structure or character, the topics, genres, and people, must become increasingly niche and unknown. Thus the first step is to move away from conventional Hollywood. While a choice like Paul Thomas Anderson is not the equivalent of legendary name like Steven Speilberg or contemporary box-office hit equivalents like Dennis Villeneuve, it still falls within a categorical definition of a Hollywood studio director, irregardless of his clearly auteur material. Still, this inherently leads ChatGPT(3.5), particularly with its mentioned inherent inability to not use the law of large numbers and LLMs due to being lines of code whose duty is to find the most accurate response to whatever words are ordered by the corresponding input variables, PTA, Noir, etc., going directly to the most-recent, most-clicked, and most-viewed articles, websites, and discussion boards online. These functions are what directly led to ChatGPT creating such, at least from my point of view, dull scripts in the first half of this course. Hence, research analyses have been published which simply conclude that Chat GPT lacks the creative ability of its human counterparts due to its nature as a software processing machine rather than a homosapien who since conception has argued its universal significance through the idea of a soul (Shidiq; 2023). The solution to this issue of a lack of creativity, therefore, is to embrace the unconventional, unknown auteur filmmakers which thankfully are scattered across the medium with great wells of undiscovered richness and depth to their work.

In-order to attempt to kindle any sparks of creativity within ChatGPT (3.5), what often may appear to be the more abstract, may be the better choice as it requires ChatGPT (3.5) to both devote its LLM analytical skills towards a widely unknown and not widely researched and discussed topic or person, but also whose abstract work forces ChatGPT to interpret cinematic concepts of visual storytelling through published data surrounding it. Furthermore, I wanted to understand how ChatGPT (3.5) inherently changes roles when it is researching female directors and writers rather than their male counterparts, particularly in cinema. Therefore the best options were both non-Americans and non-conventional filmmakers and storytellers to incentivize this spawning Artificial Intelligence into a outwardly appearing ‘original’ script so widely-loved directors like Greta Gherwig and Sofia Coppola, had to be side-stepped. Therefore, the options to incentivize a creative output from ChatGPT are international directors whose auteur nature redeveloped the genre thus choices like Larisa Shepitko, Věra Chytilová, Mati Diop, and Chantal Akerman seemed beyond adequate choices. Still, these auteur directors, besides Mati Diop, all hold a commonplace European identity and thus a scope beyond these auteurs is needed to identify how ChatGPT (3.5) classifies and subsequently diminishes both women and non-white director’s work. The best choice in regards to these directors were likewise from the avant-garde side of the medium whose visual storytelling still de-platforms nearly any other remarkable quality of their work so the choices of Djibril Diop Mambéty, Andrzej Żuławski, and Nagisa Ōshima seemed beyond adequate choices as well. Thus, the goal of utilizing the input of the aforementioned writer-directors is to force ChatGPT’s hand into creating an ‘original’ script that cannot be based purely upon the popularity of a specific film, scene, or theme of a director but rather some form of conjectural proof of a creative process taking place.

The first steps towards a script output are unsurprisingly, extremely simple, the inclusion of the directors name prefaced with ‘script by’, serves as the most obvious first step, however it does not inherently mean that the script itself will hold an element of originality that does not seem to include purely recycled content, if plausible. Fundamentally, the results of each director’s input reveals an aspect of ChatGPT’s analysis and inherent biases due to its nature of being a LLM. In-order to categorize the ability of ChatGPT (3.5) to create a piece of content that follows the descriptive content it collects to make a piece of ‘original’ script, rather than simply utilizing functions Newton’s Sum Method for quadratics as a means to find the most commonly researched and published work from each filmmaker, the scripts themselves must be analyzed off their ability to create specific visual storytelling rather than simply indicating a character’s or scene’s emotional tone through often clunky descriptors. However, in order to achieve this goal of visual storytelling rather than pure tonal description of the scene itself, the input’s chosen director plays a great deal. What seems to be most interesting is that it is not the work of auteur filmmakers who are close to the edge of what is defined as ‘cinema’ with the likes of Akerman’s Jeanne Dielman, 23 quai du Commerce, 1080 Bruxelles which achieve the greatest amounts of pure storytelling in the scripts, rather it is the work of rather unknown directors whose filmography is predicated on a narrative which has meaning layered within each character and narrative decision like the work of Djibril Diop Mambéty. Surprisingly, both ChatGPT written scripts do not hold any dialogue but it is the comparison of imagery and emotional tone which begins to discern when this AI’s reliance, or less-so, upon the most popular and viewed work of each filmmaker.

“Ramatou’s gaze lingers on the children for a moment, a bittersweet smile playing at the corners of her lips, before she continues on her solitary journey.” -ChatGPT (3.5)’s Djibril Diop Mambéty

“As Marie stares into the distance, a sense of restlessness begins to stir within her, a longing for something beyond the confines of her solitary existence.” -ChatGPT (3.5)’s Chantal Akerman

When analyzing these two scripts what becomes abundantly clear is still a reliance on descriptors of non-on screen actions form an emotional tone which cannot be visualized or delivered. Between the two examples, though the generated Djibril Diop Mambéty script still contains a reliance on emotional descriptors to generate some tone, the comparison between Chantal Akerman shows a clear dissociation. What is most obvious is the action lines in response to each woman’s stare, while Marie’s stare results in ChatGPT relying yet again upon emotional descriptors to set a tone without a material description of the room and actions of the character to set tone, Ramatou does the exact opposite. While ChatGPT’s Akerman script references a vague ‘longing’, Djibril’s gives direct action lines of Ramatou’s gaze and smile. Still, alarming contiguities can be found in both scripts with a word-for-word recreation to the point that it’s placement is exactly the same, namely in reference to ‘solitary’. This clear copy, shows a definite inability to act wholly ‘original’ in scriptwriting beyond the page into the narratives presented themselves as Akerman’s script of a lone woman in her home acutely resembles her ‘75 classic of the same vain, the direct copy of the most popular content from each director doesn’t stop there. The best example of this can be found in the generated ‘work’ of Věra Chytilová whose obvious ChatGPT copy can be seen in the lines below.

“She picks up a paintbrush and dips it into a pot of vibrant red paint, her hand moving with reckless abandon as she splatters the colors across wall in a frenzied burst of creativity.” -ChatGPT (3.5)’s Věra Chytilová

To state that the lines above have not been influenced by the 1966 Czech New Wave’s icon Daisies, is beyond an oversight. The connections between the avant-garde trendsetter and this are astounding, to the point of utilizing a device that had been used in the film a spraying paint across one another. While it was not only the reference to a messy nonchalant references towards vandalism for the sake of fun, Daisies (1966) replicated this insanity ranging from food to chandelier bits being thrown by actresses Jitka Cerhová and Ivana Karbanová at one another in the finale as just one example, but it is the function of the piece itself as a form of highly radical, what has been argued as anarchist, styles to filmmaking. However, this trend of near copy and paste, undoubtedly derived from the software analytics of ChatGPT (3.5) extends beyond the work of Chytilová into a half-baked form of Żuławski’s Possession (1981) as his AI counterpart. In fact, Mati Diop’s ChatGPT (3.5) recreation holds striking similarities to Atlantique, her Palme D’or winning picture. However, the largest issue here seems to be a clear divide between scripts which appear to hold the least amount of vague emotional tone-descriptors, and the director’s identity as a woman.

Upon examination of the scripts which appear to have the utmost utilization of watery emotional tone-setters rather than direct description of visual storytelling to convey theme and depth, the vast majority of the input writers to ChatGPT (3.5), are women. In fact the best option for the least amount of a script that delivers simply vague and untraceable tones is the relatively unknown female director, Larisa Shepitko whose script mainly contains scene descriptors but not emotional tone-setting phrases of second word. Still, the only scripts that seem to hold a clear delineation from describing the emotions of a scene purely through descriptions is the world of Djibril Diop Mambéty and Nagisa Ōshima, the only male directors besides Żuławski. This can be exemplified in both the content and the delivery through ChatGPT (3.5) of the dialogue and scene descriptions of each director’s generated script. While Mambéty’s work clearly utilizes visual storytelling in its scene descriptions to relay its messaging, Ōshima’s generated script is mostly dialogue. Nevertheless, both directors curated AI content tends to depend less on descriptions of emotional beats and tones of each cinematic moment, rather on thematic imagery or dialogue. Thus, this presents a worthwhile question about the nature of ChatGPT (3.5), particularly when it comes to work of female directors. The biggest examples of such can be found not only in the relatively unknown work of Mati Diop with vague descriptors of watching the ‘depth of the ocean’ rather than detailed descriptions of the scene, actions, and dialogue to set tone, but in the work of even acclaimed female directors like Chantal Akerman.

“She moves from room to room, her movements a silent symphony of solitude and longing” -ChatGPT (3.5)’s Chantal Akerman

These lines, unfortunately, lack any form of direct material description or direct thematic link to the materials, objects, or even lack of objects in the room, actions of the protagonist, or even uncontrollable instances like the breezes of Diop’s generated script. Here, there is simply a descriptor of emotion with little information given about the setting to delineate a theme, instead it is just told. This trend is alarmingly more present in the female directors than in the male, in fact the male director whose generated script utilized similar platitudes was a poor imitation of Żuławski’s film Possession (1981), the only romance presented besides Diop’s work. What makes this use of vague emotional platitudes disturbing, is an association with female directors.

To conclude this somewhat disappointing analysis of the abilities of ChatGPT (3.5), there is not a complete lack of creativity and originality from this newly born intelligence system. Frankly, that ChatGPT (3.5) was able to create scripts for both Larisa Shepitko and Mambéty without any direct reference to base a script off of like Akerman’s Jeanne Dielman, 23, is extremely optimistic, especially in regards ChatGPT (3.5)’s general need for updating on Large Language Models according to analyses from both universities, private and public businesses alike. However, this does not mean that this software is malproof nor free of clear processing errors that can only come from scripted code like it. The biggest issue comes with how ChatGPT delivers the emotional weight of each scene. While directors like Shepitko, Mambéty, and Ōshima were given relatively less amounts of poorly developed descriptors whose purpose of setting a tone ironically makes the script more vague and less endearing, these directors definitely fall under a category which is particularly underground, especially when comparing common names in world cinema like Chytilová and Akerman. Still, what precedes all other concerns is the level of vague emotional descriptors that are directly tied to either an inserted romance plot or a female director. The fact that the most pungent examples of vague tone-setting which in fact performs the opposite function can be found only in romance films or those directed by female directors is the biggest piece of evidence of ChatGPT’s current limitations. If the countless man-hours behind ChatGPT wants to produce a purely creative output, it must understand and get over its inherent biases first.

Bibliography

Frieder, Simon, Luca Pinchetti, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Petersen, and Julius Berner. “Mathematical capabilities of chatgpt.” Advances in Neural Information Processing Systems 36 (2024).

Benzon, William. Discursive Competence in ChatGPT, Part 1: Talking with Dragons. Version 2, Working Paper, 2023.

Shidiq, Muhammad. “The use of artificial intelligence-based chat-gpt and its challenges for the world of education; from the viewpoint of the development of creative writing skills.” In Proceeding of international conference on education, society and humanity, vol. 1, no. 1, pp. 353-357. 2023.

Sveinsson, Oddbergur. “Chomsky vs. ChatGPT: An Examination of Chomsky´ s Universal Grammar in the Age of Large Language Models.” PhD diss., 2023.

Chew, Peter. “Empirical Study on the Influence of Different Mathematical Methods (Algebraic Formula Method and Newton Sum Method) on ChatGPT (AI) Competence in Solving Quadratic Root Functions.” (2023).

Wu, Tianyu, Shizhu He, Jingping Liu, Siqi Sun, Kang Liu, Qing-Long Han, and Yang Tang. “A brief overview of ChatGPT: The history, status quo and potential future development.” IEEE/CAA Journal of Automatica Sinica 10, no. 5 (2023): 1122-1136.

Reed, Perry Arthur. “Is ChatGPT Creative? Cognitive-affective Responses to AI-generated Stories.” PhD diss., Fielding Graduate University, 2023.

Chu, Yueying, and Peng Liu. “Public aversion against ChatGPT in creative fields?.” The Innovation 4, no. 4 (2023): 100449.

ChatGPT (3.5) | Up to the Challenge, Only if AI Can Get Over Numerical and Gender-based Biases

Comments

Leave a Reply Cancel reply