4

I have a tab delimited file in the format:

sentenceID (sid)    documentID (scid)   sentenceText (sent)

E.g.

100004  100 即便您喜爱流连酒吧,也定然在这轻松安闲的一隅,来一场甜蜜沉醉的约会。
100005  100 您可以慢慢探究菜单上所有的秘密惊喜。

I want to put it into sqlite3 with the following schema:

CREATE TABLE sent (
    sid INTEGER PRIMARY KEY,
    scid INTEGER,
    sent TEXT,
    );

Is there a quick way to use the python API for sqlite (http://docs.python.org/2/library/sqlite3.html) to put them into a table?

I've been doing it as such:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sqlite3 as lite
import sys, codecs

con = lite.connect('mycorpus.db')

with con:    
    cur = con.cursor()
    cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
    for line in codecs.read('corpus.tab','r','utf8'):
        sid,scid,sent = line.strip().split("\t")
        cur.execute("INSERT INTO Corpus VALUES("+sid+","+scid+"'"+sent+"')")
0

2 Answers 2

3

Here's an example using unicodecsv module:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sqlite3

import unicodecsv


con = sqlite3.connect('mycorpus.db')
cur = con.cursor()
cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")

with open('corpus.tab', 'rb') as input_file:
    reader = unicodecsv.reader(input_file, delimiter="\t")
    data = [row for row in reader]

cur.executemany("INSERT INTO Corpus (sid, scid, sent) VALUES (?, ?, ?);", data)
con.commit()

Also see:

Hope that helps.

Sign up to request clarification or add additional context in comments.

2 Comments

quotes? I don't see quotes in your example. Anyway, you can pass appropriate quotechar to the reader object or set quoting. See docs.
=) No worries your code works for input without " or '. Just that some of my other lines on the file has crazy quotation marks.
3
    #!/usr/bin/python
    # -*- coding: utf-8 -*-

    import sqlite3 as lite

    con = lite.connect('myCorpus.db')
    cur = con.cursor() 

    cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")

    data=[row.split('\t') for row in file('myfile.tab','r').readlines()]
    cur.executemany("INSERT INTO Corpus (sid, scid,sent) VALUES (?, ?, ?);", data)

    con.commit()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.