4

I'm trying to output multiple lists of data, of varying length, to a CSV file. Each list should be a column in the output CSV file. Is there a straight-forward way of doing thing? If I were outputting each list as a row, I'd just loop over each list and output a return when I hit the end, but this approach does not work when working column-wise.

I thought of going over all the lists at once, item by item and incrementing a counter, but this would also fail because some lists are longer than others. To remedy this I would have to check at each iteration whether the counter is past the end of each list, which would be fairly expensive in terms of computations.

Thanks for any ideas!

2
  • 1
    And what should happen if two lists have different length? Empty entry? Commented Dec 2, 2010 at 18:40
  • Most of the cost is writing to IO, how you do it is unlikely to be important. I suggest you write it the way you intended and not to worry about performance (assuming you have use sensible buffering) Commented Dec 2, 2010 at 18:45

8 Answers 8

2

I think this is pretty straight-forward:

public static void main(String... args) throws IOException {

    ArrayList<ArrayList<String>> rows = getRandomData();

    if (rows.size() == 0)
        throw new RuntimeException("No rows");

    // normalize data
    int longest = 0;
    for (List<String> row : rows)
        if (row.size() > longest)
            longest = row.size();

    for (List<String> row : rows)
        while (row.size() < longest)
            row.add("");

    if (longest == 0)
        throw new RuntimeException("No colums");

    // fix special characters
    for (int i = 0; i < rows.size(); i++)
        for (int j = 0; j < rows.get(i).size(); j++)
            rows.get(i).set(j, fixSpecial(rows.get(i).get(j)));

    // get the maximum size of one column
    int[] maxColumn = new int[rows.get(0).size()];

    for (int i = 0; i < rows.size(); i++)
        for (int j = 0; j < rows.get(i).size(); j++)
            if (maxColumn[j] < rows.get(i).get(j).length())
                maxColumn[j] = rows.get(i).get(j).length();

    // create the format string
    String outFormat = "";
    for (int max : maxColumn)
        outFormat += "%-" + (max + 1) + "s, ";
    outFormat = outFormat.substring(0, outFormat.length() - 2) + "\n";

    // print the data
    for (List<String> row : rows)
        System.out.printf(outFormat, row.toArray());

}

private static String fixSpecial(String s) {

    s = s.replaceAll("(\")", "$1$1");

    if (s.contains("\n") || s.contains(",") || s.contains("\"") || 
            s.trim().length() < s.length()) {
        s = "\"" + s + "\"";
    }

    return s;
}

private static ArrayList<ArrayList<String>> getRandomData() {

    ArrayList<ArrayList<String>> data = new ArrayList<ArrayList<String>>();

    String[] rand = { "Do", "Re", "Song", "David", "Test", "4", "Hohjoh", "a \"h\" o", "tjo,ad" };
    Random r = new Random(5);

    for (int i = 0; i < 10; i++) {

        ArrayList<String> row = new ArrayList<String>();

        for (int j = 0; j < r.nextInt(10); j++)
            row.add(rand[r.nextInt(rand.length)]);

        data.add(row);
    }

    return data;
}

Output (pretty ugly since its random) (escapes):

Re       , 4           , "tjo,ad" , "tjo,ad" ,    
"tjo,ad" , "a ""h"" o" ,          ,          ,    
Re       , "a ""h"" o" , Hohjoh   , "tjo,ad" , 4  
4        , David       ,          ,          ,    
4        , Test        , "tjo,ad" , Hohjoh   , Re 
Do       , Hohjoh      , Test     ,          ,    
Hohjoh   , Song        ,          ,          ,    
4        , Song        ,          ,          ,    
4        , Do          , Song     , Do       ,    
Song     , Test        , Test     ,          ,    
Sign up to request clarification or add additional context in comments.

3 Comments

This is essentially what I had coded, but the pre-normalizing bit is interesting. It means I only need to check N times instead of N^2 times. Thanks!
and what if the string contains a comma?
@peter.murray.rust: fixed now, the previous implementation only fixed the column width, now it adds commas aswell (and "escapes" the special characters - [\n|\"|,])
2

It's worth having a look at http://commons.apache.org/sandbox/csv/

This also references some other CSV libraries.

Note that many answers have not considered strings which contain commas. That's the sort of reason why libraries are better than doing it yourself.

4 Comments

+1 for being the first suggesting a csv library. How come everybody thinks generating/parsing csv is easy but nobody would write a xml parser?
Actually, I've coded xml parsers. This data actually needs to be output as a column-wise CSV for other people.
Thanks for the link! Seems like OpenCSV is pretty nice.
It's almost alwayhs better to look for libraries. How many people would think of quoting string which contained commas? And in XML, how many parsers don't process parameter entities?
1

You can use String.format():

System.out.println(String.format("%4s,%4s,%4s", "a", "bb", "ccc"));
System.out.println(String.format("%4s,%4s,%4s", "aaa", "b", "c"));

The result will be a fixed column width of 4 characters - as long as the used values are shorter. Otherwise the layout will break.

   a,  bb, ccc
 aaa,   b,   c

Comments

1

I'm not familiar with Java at all, but if you have a matrix oriented data type, you could fill the rows using easy looping, then transpose it, then write it out using easy looping. Your printing routine could handle null entries by outputting a null string, or fixed width spaces if you prefer.

1 Comment

That's what I was thinking initially. I just don't know that it'd be computationally efficient, particularly with the volume of data I have to output. If the transpose operation is O(1) or O(logN) then it might be worth it though. I'll have a look.
1

Create an array of iterators (one for each list.) Then loop over the array, checking if the iterator hasNext(); if it does, output iterator.next(). Outputting commas and newlines is trivial. Stop when all iterators have returned hasNext()==false.

Comments

0

You can do something like this:

List<List<?>> listOfLists = new LinkedList<List<?>>(); 
List<Iterator<?>> listOfIterators = new LinkedList<Iterator<?>>(); 
for (List<?> aList : listOfLists) {
         listOfIterators.add(aList.iterator()); 
}        
boolean done = false;        
while(!done) 
{   
      done = true;  
      for (Iterator<?> iter : listOfIterators)  
      {         
          if (iter.hasNext())       
          {             
             Object obj = iter.next();          
             //PROCESS OBJ          
             done = false;      
          }         
          else      
          {             
             //PROCESS EMPTY ELEMENT        
          }     
       } 
}

For CSV processing I have used this library several times: http://www.csvreader.com/java_csv.php Very simple and convenient.

Cheerz!

Comments

0

I would have to check at each iteration whether the counter is past the end of each list, which would be fairly expensive in terms of computations.

Get over it. This will, realistically, be small compared to the cost of actually doing the iteration, which in turn will be tiny compared to the cost of writing any given bit of text to the file. At least, assuming you have random access containers.

But you shouldn't be thinking in terms of a counter and indexing anyway; you should be thinking in terms of iterators (which sidestep the random-access question and simplify the code).

1 Comment

Getting over it is certainly very helpful in many cases. Unfortunately it still doesn't answer the question. It simply shows that you disagree with its premise.
0

If you wanted to do this in one pair of loops and one method, you could do the following.

public static void writeCSV(PrintWriter pw, List<List<String>> columnsRows) {
    for(int i=0;;i++) {
        StringBuilder line = new StringBuilder();
        boolean empty = true;
        for (List<String> column : columnsRows) {
            String text = i < column.size() ? column.get(i) : "";
            found &= i >= column.size();
            if (text.contains(",") || text.contains("\"") || text.contains("\n") || text.trim() != text)
                text = '"' + text.replaceAll("\"", "\"\"") + '"';
            line.append(text).append(',');
        }
        if (empty) break;
        pw.println(line.substring(0, line.length()-1));
    }
}

As an exercise, you could do this with one loop, but it wouldn't be as clear as to what its doing.

Using the sample data from @dacwe, this method takes 10 us (micro-seconds).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.