3

I have a string which can be of the form:

s1 = "Hello HAHA"
s2 = '["Hello HAHA"]'
s3 = "{Hello HAHA}"

I want to find out if the input string is like s1, s2 or s3. My motive is to sanitise this input and save it in s1 format.

Basically I need to know if the input string is in s1 form or not.

Solutions I have thought of:

  1. json.loads(s) and catch exception to check if it is a json or not
  2. Regex search to see if the input string has {|}|[|] in it in the starting and end position, and replace them.

What will be the most pythonic way to go about it?

3
  • IMHO, the most pythonic way would be to try parsing as JSON and catching the exception. Python's exceptions are cheap and meant to be used extensively. Commented Feb 24, 2016 at 8:17
  • Yo but that will just solve for s3, not s2. That is where the bigger problem is @jsfan Commented Feb 24, 2016 at 8:22
  • Correction: json.loads will work for s2 too and will thus solve problem :D Commented Nov 24, 2016 at 5:36

3 Answers 3

2

Use strip. s.strip('[]"{}') will remove the unwanted characters at the ends of the string.

>>> unwanted = '[]"{}'

>>> 'Hello HAHA'.strip(unwanted)
'Hello HAHA'

>>> '["Hello HAHA"]'.strip(unwanted)
'Hello HAHA'

>>> '{Hello HAHA}'.strip(unwanted)
'Hello HAHA'
Sign up to request clarification or add additional context in comments.

2 Comments

How is this answering the question? For example, not every string that begins with [ and ends with ] is a valid list after evaluation. Did I misunderstand something here?
@timgeb The question says there are 3 forms.
1

Your approach for the JSON string is correct. I'd check for the list like this:

>>> from ast import literal_eval
>>> def is_listliteral(x):
...     try:
...         return isinstance(literal_eval(x), list)
...     except (SyntaxError, ValueError):
...         return False
>>> is_listliteral('[')
False
>>> is_listliteral('[1,"2",{}]')
True
>>> is_listliteral('{}')
False

And I'm sure you can write the conditional statements to check for either JSON or list and then return True for the string check if both of those fail.

edit: There's a downside: this solution only works for nested lists if the objects inside can be evaluated by literal_eval (strings, numbers, tuples, lists, dicts, booleans, and None).

>>> is_listliteral('[1,2,{1,2,3}]')
False

So it's not perfect. It might be good enough for your case. I don't know a better solution for now.

2 Comments

ast.literal_eval is the big takeaway here. I can use it to test for json as well. Any idea about what is more efficient though? ujson.loads() or ast.literal_eval()? @timgeb
@ketanbhatt "I can use it to test for json as well." <- why would you? Any string that would be a valid python dictionary could be evaluated.
1

Regex search to see if the input string has {|}|[|] in it in the starting and end position, and replace them.

re.sub(r'^\W+|\W+$', '', string)

or

re.sub(r'^[\[{"']+|['"}\]]+$', '', string)

This removes one or more non-word characters exists at the start or at the end.

1 Comment

@PeterWood \W should alo match quotes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.