I needed to convert an SQL export with a list of single line INSERT statements into UPDATE statements.
Instead of the traditional writing of python code, I used ChatGPT to generate a python script to perform the task.
This blog post, defines the journey to reach the desired outcome, and highlights that for simple cases, ChatGPT can solve problems quite often, but when it comes to details, it requires plenty of iterations and prompt refining.

First simple ChatGPT prompt

In my first ChatGPT prompt for this problem, I wrote simple instructions:

create python code to convert a list of sql insert statements into a list of sql update statements.


It generated the python script bellow:
import re

def convert_insert_to_update(insert_statements, primary_key_column):
    update_statements = []

    insert_pattern = re.compile(
        r"INSERT INTO (\w+) \(([^)]+)\) VALUES \(([^)]+)\);",
        re.IGNORECASE
    )

    for insert in insert_statements:
        match = insert_pattern.match(insert)
        if match:
            table_name = match.group(1)
            columns = match.group(2).split(', ')
            values = match.group(3).split(', ')

            # Assuming the primary key column is the first column for simplicity
            # Adjust this according to your actual primary key column
            primary_key_value = values[columns.index(primary_key_column)]

            set_clause = ", ".join(
                f"{col}={val}" for col, val in zip(columns, values)
            )

            update_statement = f"UPDATE {table_name} SET {set_clause} WHERE {primary_key_column}={primary_key_value};"
            update_statements.append(update_statement)
        else:
            print(f"Could not parse insert statement: {insert}")

    return update_statements

# Example usage:
insert_statements = [
    "INSERT INTO employees (id, name, age, department) VALUES (1, 'John Doe', 30, 'HR');",
    "INSERT INTO employees (id, name, age, department) VALUES (2, 'Jane Smith', 25, 'Finance');",
]

primary_key_column = 'id'
update_statements = convert_insert_to_update(insert_statements, primary_key_column)

for stmt in update_statements:
    print(stmt)

When I execute the generated script, it outputs:
UPDATE employees SET id=1, name='John Doe', age=30, department='HR' WHERE id=1;
UPDATE employees SET id=2, name='Jane Smith', age=25, department='Finance' WHERE id=2;

At the first glance, all looks good, but the devil is in the details. For the simple example provided by ChatGPT, it will output the correct SQL update statement but for real data which has more complex values including commas, single quotes and double quotes, it will fail.

If I tweak the insert_statements assignment of python code provided by ChatGPT to:
insert_statements = [
    "INSERT INTO employees (id, name, age, department) VALUES (1, 'John, Comma', 30, 'HR');",
    "INSERT INTO employees (id, name, age, department) VALUES (2, 'Jane\\\" Smith\\\"', 25, 'Finance');",
    "INSERT INTO employees (id, name, age, department) VALUES (2, 'Jane\\\' Smith\\\'', 25, 'Finance');",
]

It will output:
UPDATE employees SET id=1, name='John, age=Comma', department=30 WHERE id=1;
UPDATE employees SET id=2, name='Jane\" Smith\"', age=25, department='Finance' WHERE id=2;
UPDATE employees SET id=2, name='Jane\' Smith\'', age=25, department='Finance' WHERE id=2;

The initial python code handles the quotes correctly, however, it fails with a comma.
In the first line it should be ..name='John, Comma' age=30... and not name='John, age=Comma'.

Second ChatGPT prompt with problematic characters

This time, I added to the prompt to take into account the characters that I knew would fail in the first solution.

create python code to convert a list of sql insert statements into a list of sql update statements, taking into account that the sql text field values can contain commas, single quotes and double quotes. The input example should have text fields with spaces, single quotes, double quotes and commas


The generate python output was:
import re


def extract_columns_and_values(insert_statement):
    """
    Extract columns and values from an SQL INSERT statement.

    Parameters:
    insert_statement (str): The SQL INSERT statement.

    Returns:
    tuple: A tuple containing a list of columns and a list of values.
    """
    match = re.match(r"INSERT INTO \w+ \((.*?)\) VALUES \((.*?)\);", insert_statement, re.IGNORECASE)
    if not match:
        raise ValueError("Invalid INSERT statement")

    columns = match.group(1).split(', ')
    values = re.findall(r"(?:'([^']*)'|\"([^\"]*)\"|([^,]+))(?:, |$)", match.group(2))

    # Flatten the list of tuples and keep only non-empty strings
    values = [val for tup in values for val in tup if val]

    return columns, values


def convert_to_update(insert_statements):
    """
    Convert a list of SQL INSERT statements to a list of SQL UPDATE statements.

    Parameters:
    insert_statements (list): A list of SQL INSERT statements.

    Returns:
    list: A list of SQL UPDATE statements.
    """
    update_statements = []

    for insert_statement in insert_statements:
        table_match = re.match(r"INSERT INTO (\w+)", insert_statement, re.IGNORECASE)
        if not table_match:
            raise ValueError("Invalid INSERT statement")

        table_name = table_match.group(1)
        columns, values = extract_columns_and_values(insert_statement)

        set_clause = ", ".join([f"{col} = '{val}'" for col, val in zip(columns, values)])

        # Assuming the primary key is the first column
        primary_key_column = columns[0]
        primary_key_value = values[0]

        update_statement = f"UPDATE {table_name} SET {set_clause} WHERE {primary_key_column} = '{primary_key_value}';"
        update_statements.append(update_statement)

    return update_statements


# Example usage
insert_statements = [
    "INSERT INTO my_table (id, name, description) VALUES (1, 'John Doe', 'This is a test, with commas and ''single quotes'' and \"double quotes\".');",
    "INSERT INTO my_table (id, name, description) VALUES (2, 'Jane Smith', 'Another test with, more text and ''quotes''.');"
]

update_statements = convert_to_update(insert_statements)
for stmt in update_statements:
    print(stmt)

When I execute the generated python script, it outputs:
UPDATE my_table SET id = '1', name = 'John Doe', description = ''This is a test' WHERE id = '1';
UPDATE my_table SET id = '2', name = 'Jane Smith', description = ''Another test with' WHERE id = '2';

It was clear that ChatGPT failed to provide a correct solution both in handling the commas and generating an input with quotes.

Third ChatGPT prompt using a csv reader

Instead of giving up on ChatGPT and doing manual tweaks, I decided to try again, but this time using my knowledge as a programmer.
Using regular expressions as proposed by ChatGPT isn’t the best approach to deal with values that include special characters, mainly commas inside values on csv data.
A INSERT SQL statement can be decomposed into 3 parts:

  1. a prefix INSERT INTO my_table (field list) VALUES (.
  2. a comma separated list of values, quotes or unquoted.
  3. a final bracket closure );.

The special characters problem is on the second part, which is nothing but a csv line, therefore, using python csv library would solve the problem without all the extra complexities.

To make sure that ChatGPT is on the same page, I asked it to see if it can create python scripts using the csv library.

write a python script that reads a csv data from a memory stream


The generated script is:
import io
import csv

# Example CSV data
csv_data = """id,name,description
1,John Doe,"This is a test, with commas and 'single quotes' and \"double quotes\"."
2,Jane Smith,"Another test with, more text and 'quotes' and special characters."
"""

# Create an in-memory stream
csv_stream = io.StringIO(csv_data)

# Read the CSV data from the memory stream
csv_reader = csv.reader(csv_stream)

# Process the CSV data
headers = next(csv_reader)  # Read the header row
rows = [row for row in csv_reader]  # Read the remaining rows

# Print the headers and rows
print("Headers:", headers)
print("Rows:")
for row in rows:
    print(row)

# Close the in-memory stream
csv_stream.close()

Running the python script, it outputs:
Headers: ['id', 'name', 'description']
Rows:
['1', 'John Doe', 'This is a test, with commas and \'single quotes\' and double quotes"."']
['2', 'Jane Smith', "Another test with, more text and 'quotes' and special characters."]

Sweet, it knows how to use csv data without generating files.
Perfect! Now let us tweak the prompt to use this method

create a Python script that converts a list of SQL INSERT statements into a list of SQL UPDATE statements, we need to ensure that the script correctly handles text fields containing spaces, single quotes, double quotes, and commas.
To solve the problem with the quotes, use csv and read it from a memory stream

import re
import io
import csv


def extract_columns_and_values(insert_statement):
    """
    Extract columns and values from an SQL INSERT statement.

    Parameters:
    insert_statement (str): The SQL INSERT statement.

    Returns:
    tuple: A tuple containing a list of columns and a list of values.
    """
    # Match the table name, columns, and values
    match = re.match(r"INSERT INTO (\w+) \((.*?)\) VALUES \((.*?)\);", insert_statement, re.IGNORECASE)
    if not match:
        raise ValueError("Invalid INSERT statement")

    table_name = match.group(1)
    columns = match.group(2).split(', ')
    values_str = match.group(3)

    # Use csv.reader to correctly parse the values part
    csv_stream = io.StringIO(values_str)
    csv_reader = csv.reader(csv_stream, quotechar="'", skipinitialspace=True)
    values = next(csv_reader)

    return table_name, columns, values


def convert_to_update(insert_statements):
    """
    Convert a list of SQL INSERT statements to a list of SQL UPDATE statements.

    Parameters:
    insert_statements (list): A list of SQL INSERT statements.

    Returns:
    list: A list of SQL UPDATE statements.
    """
    update_statements = []

    for insert_statement in insert_statements:
        table_name, columns, values = extract_columns_and_values(insert_statement)

        set_clause = ", ".join([f"{col} = '{val}'" for col, val in zip(columns, values)])

        # Assuming the primary key is the first column
        primary_key_column = columns[0]
        primary_key_value = values[0]

        update_statement = f"UPDATE {table_name} SET {set_clause} WHERE {primary_key_column} = '{primary_key_value}';"
        update_statements.append(update_statement)

    return update_statements


# Example usage
insert_statements = [
    "INSERT INTO my_table (id, name, description) VALUES (1, 'John Doe', 'This is a test, with commas and ''single quotes'' and \"double quotes\".');",
    "INSERT INTO my_table (id, name, description) VALUES (2, 'Jane Smith', 'Another test with, more text and ''quotes''.');"
]

update_statements = convert_to_update(insert_statements)
for stmt in update_statements:
    print(stmt)

Running the generated script, it will output:
UPDATE my_table SET id = '1', name = 'John Doe', description = 'This is a test, with commas and 'single quotes' and "double quotes".' WHERE id = '1';
UPDATE my_table SET id = '2', name = 'Jane Smith', description = 'Another test with, more text and 'quotes'.' WHERE id = '2';

The output handles correctly the commas better but still there some issues to address:

  • SQL exports, such as mysqldump, export single quotes as \' instead of ''
  • It doesn’t add a slash before the single quote since the input it’s python code and not SQL text and slash must be duplicated.

Fourth ChatGPT prompt, reading input file

Instead of explaining to ChatGPT that it should duplicate the slashes before the quotes because it’s python code, not SQL data, I decided it would be best to ask it to read the input from a file instead of hardcode it, which represents the most common case.
I have attempted to add more instructions to the prompt but ChatGPT kept failing by not using csv library anymore, so instead, I gave it a code snippet, and asked it to tweak it.
It took me a few prompts for it to understand how the \’ should be used instead of ''.

Given this prompt:

change the following python script to read the array from a file instead of assignment.
use the example on code below to give an example of SQL input file.
On the input file where there are pairs of single quotes, it should be a single slash followed by a single quote. The python code shouldn't change due this last request, that is, python must not do any parsing of slash quotes.
# Example usage
insert_statements = [
    "INSERT INTO my_table (id, name, description) VALUES (1, 'John Doe', 'This is a test, with commas and ''single quotes'' and \"double quotes\".');",
    "INSERT INTO my_table (id, name, description) VALUES (2, 'Jane Smith', 'Another test with, more text and ''quotes''.');"
]



ChatGPT generated the following insert_statements.sql:
INSERT INTO my_table (id, name, description) VALUES (1, 'John Doe', 'This is a test, with commas and \'single quotes\' and "double quotes".');
INSERT INTO my_table (id, name, description) VALUES (2, 'Jane Smith', 'Another test with, more text and \'quotes\'.');

and, it generated the following python code:
# Function to read SQL insert statements from a file
def read_sql_from_file(filename):
    with open(filename, 'r') as file:
        insert_statements = [line.strip() for line in file if line.strip()]
    return insert_statements


# Example usage
filename = 'insert_statements.sql'
insert_statements = read_sql_from_file(filename)

# Display the insert statements
for stmt in insert_statements:
    print(stmt)

Once, I merged this code snippet with the previous code:
import re
import io
import csv


def extract_columns_and_values(insert_statement):
    """
    Extract columns and values from an SQL INSERT statement.

    Parameters:
    insert_statement (str): The SQL INSERT statement.

    Returns:
    tuple: A tuple containing a list of columns and a list of values.
    """
    # Match the table name, columns, and values
    match = re.match(r"INSERT INTO (\w+) \((.*?)\) VALUES \((.*?)\);", insert_statement, re.IGNORECASE)
    if not match:
        raise ValueError("Invalid INSERT statement")

    table_name = match.group(1)
    columns = match.group(2).split(', ')
    values_str = match.group(3)

    # Use csv.reader to correctly parse the values part
    csv_stream = io.StringIO(values_str)
    csv_reader = csv.reader(csv_stream, quotechar="'", skipinitialspace=True)
    values = next(csv_reader)

    return table_name, columns, values


def convert_to_update(insert_statements):
    """
    Convert a list of SQL INSERT statements to a list of SQL UPDATE statements.

    Parameters:
    insert_statements (list): A list of SQL INSERT statements.

    Returns:
    list: A list of SQL UPDATE statements.
    """
    update_statements = []

    for insert_statement in insert_statements:
        table_name, columns, values = extract_columns_and_values(insert_statement)

        set_clause = ", ".join([f"{col} = '{val}'" for col, val in zip(columns, values)])

        # Assuming the primary key is the first column
        primary_key_column = columns[0]
        primary_key_value = values[0]

        update_statement = f"UPDATE {table_name} SET {set_clause} WHERE {primary_key_column} = '{primary_key_value}';"
        update_statements.append(update_statement)

    return update_statements

##### New Code inserted here #####

# Function to read SQL insert statements from a file


def read_sql_from_file(filename):
    with open(filename, 'r') as file:
        insert_statements = [line.strip() for line in file if line.strip()]
    return insert_statements


# Example usage
filename = 'insert_statements.sql'
insert_statements = read_sql_from_file(filename)

# Display the insert statements
for stmt in insert_statements:
    print(stmt)

##### End new code ####

update_statements = convert_to_update(insert_statements)
for stmt in update_statements:
    print(stmt)

Running this script:
INSERT INTO my_table (id, name, description) VALUES (1, 'John Doe', 'This is a test, with commas and \'single quotes\' and "double quotes".');
INSERT INTO my_table (id, name, description) VALUES (2, 'Jane Smith', 'Another test with, more text and \'quotes\'.');
UPDATE my_table SET id = '1', name = 'John Doe', description = 'This is a test, with commas and \single quotes\' and "double quotes".'' WHERE id = '1';
UPDATE my_table SET id = '2', name = 'Jane Smith', description = 'Another test with, more text and \quotes\'.'' WHERE id = '2';


The generated python code is closer to my final goal but still didn’t solve the problem with the quotes.
At this point, I consider it would be faster to give some final manual tweaks to the code and the input file to get the final result, since ChatGPT tryouts were always providing invalid results, and often failed to use csv to split the values.

By adding a slash function and the parameter escapechar="\\" to csv
Here is the final result with manual tweaks:
import re
import io
import csv


def extract_columns_and_values(insert_statement):
    """
    Extract columns and values from an SQL INSERT statement.

    Parameters:
    insert_statement (str): The SQL INSERT statement.

    Returns:
    tuple: A tuple containing a list of columns and a list of values.
    """
    # Match the table name, columns, and values
    match = re.match(r"INSERT INTO (\w+) \((.*?)\) VALUES \((.*?)\);", insert_statement, re.IGNORECASE)
    if not match:
        raise ValueError("Invalid INSERT statement")

    table_name = match.group(1)
    columns = match.group(2).split(', ')
    values_str = match.group(3)

    # Use csv.reader to correctly parse the values part
    csv_stream = io.StringIO(values_str)
    csv_reader = csv.reader(csv_stream, quotechar="'", skipinitialspace=True, escapechar="\\")  # added the escape char
    values = next(csv_reader)

    return table_name, columns, values


def slash(value):  # added function
    return re.sub("(['\"])", r"\\\1", value)


def convert_to_update(insert_statements):
    """
    Convert a list of SQL INSERT statements to a list of SQL UPDATE statements.

    Parameters:
    insert_statements (list): A list of SQL INSERT statements.

    Returns:
    list: A list of SQL UPDATE statements.
    """
    update_statements = []

    for insert_statement in insert_statements:
        table_name, columns, values = extract_columns_and_values(insert_statement)

        set_clause = ", ".join([f"{col} = '{slash(val)}'" for col, val in zip(columns, values)])

        # Assuming the primary key is the first column
        primary_key_column = columns[0]
        primary_key_value = values[0]

        update_statement = f"UPDATE {table_name} SET {set_clause} WHERE {primary_key_column} = '{primary_key_value}';"
        update_statements.append(update_statement)

    return update_statements

# Function to read SQL insert statements from a file


def read_sql_from_file(filename):
    with open(filename, 'r') as file:
        insert_statements = [line.strip() for line in file if line.strip()]
    return insert_statements


# Example usage
filename = 'insert_statements1.sql'
insert_statements = read_sql_from_file(filename)

# Display the insert statements
for stmt in insert_statements:
    print(stmt)

update_statements = convert_to_update(insert_statements)
for stmt in update_statements:
    print(stmt)

and tweaked insert_statements1.sql with correct quoting:
INSERT INTO my_table (id, name, description) VALUES (1, 'John Doe', 'This is a test, with commas and \'single quotes\' and \"double quotes\".');
INSERT INTO my_table (id, name, description) VALUES (2, 'Jane Smith', 'Another test with, more text and \'quotes\'.');

It generates the output:
INSERT INTO my_table (id, name, description) VALUES (1, 'John Doe', 'This is a test, with commas and \'single quotes\' and \"double quotes\".');
INSERT INTO my_table (id, name, description) VALUES (2, 'Jane Smith', 'Another test with, more text and \'quotes\'.');
UPDATE my_table SET id = '1', name = 'John Doe', description = 'This is a test, with commas and \'single quotes\' and \"double quotes\".' WHERE id = '1';
UPDATE my_table SET id = '2', name = 'Jane Smith', description = 'Another test with, more text and \'quotes\'.' WHERE id = '2';

Conclusion

The output was finally correct, however, this code doesn’t addresses:

  1. newline characters inside the values.
  2. NULL, it should be unquoted.
  3. Numerical values should be unquoted.
  4. The SQL exports can assume first list the table creation and later the INSERT statement without the definition of the columns.