0

I'm stuck, I've searched several ways and I can't get the correct output.

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
ns =''.join([i for i in string if i.isalpha()])
print(ns)

HelloIhaveaBigproblemisnotagoodnumber

I want this output:

Hello I have a Big problem is not a good number

Can you help me? Thank you!!

3
  • But there, you just described what you don't want. You didn't say what you want. I mean, which criteria leads to "Hello I have a Big problem is not a good number". To you want to remove digits and punctuation? Or to remove everything but letters and spaces? Commented Nov 11, 2022 at 18:11
  • thanks for ask, I want to remove the non-alphabetic characters, but I want to get the string as a sentence separated word by word Commented Nov 11, 2022 at 18:13
  • In general for example if i have an string with ()/#@€ Commented Nov 11, 2022 at 18:18

4 Answers 4

2

You could increase the conditions used for each character, e.g.,

ns =''.join([i for i in string if i == " " or i.isalpha()])

But there is a problem with sequences like "666 " that leave an extra space in the text.

Instead, you could use a regex to break the string down into a list of words and intervening non-word text. Filter out the stuff you don't want, and then remove any items where the word itself went to zero size.

import re

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
tmp = []

for word, other in re.findall(r"(\w+)([^\w]*)", string):
    # strip non-alpha
    word = "".join(c for c in word if c.isalpha())
    # preserve only spaces
    other = "".join(c for c in other if c == " ")
    # only add if word still exists
    if word:
        tmp.append(word + other)
ns = "".join(tmp)
print(ns)

Output

Hello I have a Big problem is not a good number
Sign up to request clarification or add additional context in comments.

1 Comment

Pure pythonic!! +1 from my side
2

Filter uisng re & remove them using re.sub

import re
string = "Hello! I have a Big!!! problem 666 is not a good number__$"
print (re.sub('[^a-zA-Z]+', ' ', string))

output #

Hello I have a Big problem is not a good number 

1 Comment

With a warning: It doesn't work for non-English alphabets.
0

You can use:

import re

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
regex = re.compile('[^a-zA-Z ]')
print(regex.sub('', string))

And it will print:

Hello I have a Big problem is not a good number

(With double space between "problem" and "is")

If you want to remove the double space you write it like this:

import re

string = "Hello! I have a Big!!! problem 666 is not a good number__$"
regex = re.compile('[^a-zA-Z ]')
string = regex.sub('', string)
print(string.replace('  ', ' '))

And now the output will be:

Hello I have a Big problem is not a good number

Comments

0

A lot of simple problems can be solved without the re library.

For this case, you can filter all characters that are not in the alphabet, or empty spaces:

from string import ascii_lowercase
ns = ''.join(filter(lambda c: c.lower() in ascii_lowercase+' ', s))
while '  ' in ns: ns = ns.replace('  ',' ')


# output:
# 'Hello I have a Big problem is not a good number'

Repeated spaces are filtered in the one-liner while-loop found above.


To work with non-english characters, you can replace ascii_lowercase with the desired choice of characters.

3 Comments

But then you have 2 spaces between problem and is
The updated version also solves this problem.
Thanks to all, I can continue with the learning of python, greetings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.