I have the below pydantic model with 6 columns out of which 2 columns are nullable.
from pydantic import BaseModel
from typing import Optional
class Purchases(BaseModel):
customer_id: int
customer_name: str
purchase_date: str
city: str
customer_nickname: Optional[str] = None
customer_address_type: Optional[str] = None
def insert_purchases(prchs: Purchases) -> None:
insert_qry = f"""
INSERT INTO cust_db.purchases(customer_id, customer_name, purchase_date, city, customer_nickname, customer_address_type)
VALUES ({prchs['customer_id']},
'{prchs['customer_name']}',
'{prchs['purchase_date']}',
'{prchs['city']}',
'{prchs['customer_nickname']}',
'{prchs['customer_address_type']}'
)"""
print(insert_qry)
spark.sql(insert_qry)
My input is as follows:
purchase_input = {"customer_id": 3, "customer_name": "John Doe", "purchase_date": "2011-12-27 13:04:52", "city": "New York", "customer_nickname": None, "customer_address_type": None}
I performed the INSERT operation using the below lines:
prch_obj = Purchases
prch_obj.insert_purchases(purchase_input)
But both the columns customer_nickname and customer_address_type are getting a string type 'None' instead of null values:
INSERT INTO cust_db.purchases(customer_id, customer_name, purchase_date, city, customer_nickname, customer_address_type)
VALUES (3,
'John Doe',
'2011-12-27 13:04:52',
'New York',
'None',
'None'
)
The only way I'm able to load null values into these fields is by using CASE in my INSERT statement for those two fields:
def insert_purchases(prchs: Purchases) -> None:
insert_qry = f"""
INSERT INTO cust_db.purchases(customer_id, customer_name, purchase_date, city, customer_nickname, customer_address_type)
VALUES ({prchs['customer_id']},
'{prchs['customer_name']}',
'{prchs['purchase_date']}',
'{prchs['city']}',
CASE WHEN '{prchs['customer_nickname']}' = 'None' THEN NULL else '{prchs['customer_nickname']}' END,
CASE WHEN '{prchs['customer_address_type']}' = 'None' THEN NULL else '{prchs['customer_address_type']}' END
)"""
print(insert_qry)
spark.sql(insert_qry)
which then correctly inserts null values:
But I do have many tables where I need handle the incoming None/NULL type values and I think using CASE is an overkill when I have to do it on hundreds of fields.
Is there a better way to achieve this?

