Tuesday, September 13, 2016

Visual Studio Online re-branded Visual Studio Team Services (VSTS)

I do not normally write on marketing news or care about marketing news especially from November of last years. Since I have written extensively on VSOL (Visual Studio Online or VSO as others call it) it is important to note, VSOL has been re-branded VSTS -- Visual Studio Team Services.

There are plenty of links to the re-branding (November 18, 2016: Microsoft Renames Visual Studio Online As Visual Studio Team Services) which affects me because I need to shift acronyms.

"Le Roi est mort, vive le Roi!" (The King is dead, long live the King) the French had it right in 1422 when Charles VII was coronated  replacing is father Charles VI.

Back to coding.

Monday, September 5, 2016

Calling Python from .NET/C#

In case you want to jump ahead and just see how the code works, the source code (Visual Studio solution, projects, C# code and python scripts) for this post can be found in Github.com at https://github.com/softwarepronto/Blog under the CSharpPythonMaster folder.

This posting will demonstrate how to invoke a Python script using C# and the Process class from the System.Diagnostic namespace. It is assumed Python is already installed on the machine on which a C# application will invoke the python script. An example of the steps required to install Python on Windows can be found at: Python: Installing on Windows. The C# application can communicate directly with the Python script by passing in command-line arguments. The Python script will communicate with the C# executable by writing to standard output. Obviously communication could be handled between C# and Python using files or sockets or dozens of other mechanisms.

C# is able to invoke any scripting language including Python. With regards to Python, running a script simply a matter of invoking (from C#) the python.exe executable and passing in a command-line argument corresponding to the name of the python script to be run. An example Python script, Python101.py, is as follows:


Note that the previous code displays all the arguments passed to Python during execution (sys.argv corresponds to the command-line arguments).

This script can be invoked from any console window provided the Python-related installation folders were added to the PATH variable during installation (see the following for more details: Python: Installing on Windows). The script is simple enough to execute as is shown below:


The general idea to run Python from C# is to include "using System.Diagnostics;" at the top of the C# file (making the items within the System.Diagnostics namespace available). The ProcessStartInfo class is used to specify the executable filename ("python") and the lone command-line argument ("Python101.py") to be invoked by the Process class instance's Start method. An example of the rough code is follows:

    ProcessStartInfo processStartInfo = new ProcessStartInfo()
    {
        Arguments = "Python101.py",
        FileName = "python"
    };

    using (Process process = new Process())
    {
        process.StartInfo = processStartInfo;
        process.Start();
        process.WaitForExit();
        process.Close();
    }

Recall also that we wanted to have C# specify command-line options to be processed by Python so more realistic rough code is as follows:

    private static void RunPythonScript(
                            string script, string scriptArgs)
    {
        ProcessStartInfo processStartInfo = new ProcessStartInfo()
        {
            Arguments = _pythonScriptToExecute + " " + scriptArgs,
            FileName = "python",
        };

        using (Process process = new Process())
        {
            // assign start information to the process 
            process.StartInfo = processStartInfo;
            process.Start();
            process.WaitForExit();
            process.Close();
        }
    }

    const string _pythonScriptToExecute = "Python101.py";

    private static void InvokePythonScript()
    {
        RunPythonScript(_pythonScriptToExecute, "Arg0 Arg1 Arg2");

The previous code handles passing command-line arguments from C# to Python (see the code demarcated by boldface). What is missing in the previous code is to have C# handle the processing of the standard output generated by the Python script.

In order for a C# application to read the standard output stream from Python or any application the ProcessStartInfo class's properties must be set as follows:
    UseShellExecute = false,
    RedirectStandardOutput = true,

Reading standard output adds complexity. There are deadlock scenarios that take place where the parent (the C# application) can wait forever for the child process (the Python script) to write to standard output. The documentation for RedirectStandardOutput (ProcessStartInfo.RedirectStandardOutput Property)states:


There are a dozen questions asked on Stackoverflow.com on the deadlock topic related to reading standard output form a child process spawned with Process.Start. The simplest way to address it is to:
  1. Insure that both standard output and standard error area read completely
  2. Only read the streams associated with standard output and standard error using asynchronous methods
An example that invokes a Python script, passes in command-line parameters to the script and reads standard output from the script without deadlocking needs a bit of infrastructure. The following events are used to indicate when the asynchronously invoked methods have read standard output and standard error:

private static AutoResetEvent _doneHandlingSTDOUT = 
                                  new AutoResetEvent(false);

private static AutoResetEvent _doneHandlingSTERR = 
                                  new AutoResetEvent(false);

private static AutoResetEvent[] _allToWaitOn = 
    { _doneHandlingSTDOUT, _doneHandlingSTERR };

The following methods are invoke asynchronously to handle the reading of standard error and standard output respectively:

private static void HandleSTDERR(
            object sendingProcess,
            DataReceivedEventArgs stderr)
{
    // Empty stream so done handling standard error stream
    if (String.IsNullOrEmpty(stderr.Data))
    {
        _doneHandlingSTERR.Set();
    }

    else
    {
        Console.Write("There was an error: ");
        Console.WriteLine(stderr.Data);
    }
}

private static void HandleSTDOUT(
            object sendingProcess,
            DataReceivedEventArgs stdout)
{
    // Empty stream so done handling standard output stream
    if (String.IsNullOrEmpty(stdout.Data))
    {
        _doneHandlingSTDOUT.Set();
    }

    else
    {
        Console.WriteLine(stdout.Data);
    }
}

The previous methods receive text form standard error and standard output respectively from the parameter of type DataReceivedEventArgs which exposes the Data property. The Data property contains a value of null when there is no data left in the stream. When Data equals null, each of the previous methods sets an event indicating that processing the stream is complete.

The code that invokes the Python script using Process.Start and that also sets up to asynchronously read of standard output and standard error is as follows:

private static void RunPythonScript(
                         string script, string scriptArgs)
{
    ProcessStartInfo processStartInfo = new ProcessStartInfo()
    {
        Arguments = script + " " + scriptArgs,
        FileName = "python",
        // can only redirect STDIO when UseShellExecute=false
        UseShellExecute = false, 
        RedirectStandardOutput = true,
        RedirectStandardError = true
    };

    using (Process process = new Process())
    {
        process.StartInfo = processStartInfo;
        process.OutputDataReceived += HandleSTDOUT;
        process.ErrorDataReceived += HandleSTDERR;
        if (!process.Start())
        {
            Console.WriteLine("Process failed to start.");

            return;
        }
        process.BeginErrorReadLine();
        process.BeginOutputReadLine();
        process.WaitForExit();
        Console.WriteLine("Process exit code:" + process.ExitCode);
        process.Close();
    }

    WaitHandle.WaitAll(_allToWaitOn);
}

The key elements of the previous method are to set the ProcessStartInfo properties as:
        UseShellExecute = false, 
        RedirectStandardOutput = true,
        RedirectStandardError = true

The Process class instance in the previous method exposes two events, OutputDataReceived and ErrorDataReceived, that are assigned to the the methods used to handle the asynchronous reading of standard output and standard error:
        process.OutputDataReceived += HandleSTDOUT;
        process.ErrorDataReceived += HandleSTDERR;

After Process.Start is invoked the following methods of the Process instance must be called in order begin the asychronsous reading of standard output and standard error:
        process.BeginErrorReadLine();
        process.BeginOutputReadLine();

What is provided is a C# shell sufficient to invoke a Python script and read the standard output generated from said script. 

Detecting Errors in the Invoked Python Script

The source code on Github includes an example of invoking a Python script, JustGarbage.py, that will generate an error when run by Python.exe:


When a script is invoked Process.Start will return false if the process failed to start. Additional the Process class's Exit code property should return non-zero on an error:
        if (!process.Start())
        {
            Console.WriteLine("Process failed to start.");

            return;
        }
        process.BeginErrorReadLine();
        process.BeginOutputReadLine();
        process.WaitForExit();
        Console.WriteLine("Process exit code:" + process.ExitCode);

It makes sense that Process.Start returns true when Python.exe is passed an nonsensical Python script like JustGarabage.py because Python does successfully get invoked  The invoked process, Python.exe, will detect and report on the error. The value for ExitCode when running the nonsensical script is one indicating an error. Python writes text to standard error indicating an error occurred. The output from th C# application is as follows with JustGarbage.py is specified as the Python script to run:


For those who know music, the Python script, JustGarbage.py, contains the names of the members of the band Garbage. 

Can Process.Start invoke the Python script directly by assigning it to ProcessStartInfo.FileName?

It is not required that ProcessStartInfo.FileName be set to "Python.exe". Instead the actually value of the python script could be specified. To demonstrated this consider the following Python script which creates a file:

import os
import sys

scriptpath = os.path.dirname(os.path.realpath(__file__))
filename = os.path.join(scriptpath, sys.argv[1])
print(filename)
file = open(filename, "w")
file.write("She came from Greece. She had a third for knowledge.")
file.close()

The file create in the previous script is named using first command-line parameter passed into the script (sys.argv[1]).

The C# code to invoke this script is follows where the script name is ProofByFile.py and the name of the Python script is assigned directly to ProcessStartInfo.FileName:

const string _pythonScriptToExecuteDirectly = "ProofByFile.py";

private static void RunScriptDirectly()
{
    string filenameToCreate = 
               DateTime.Now.ToString("yyyyMMddhhmmssfff") + ".txt";
    ProcessStartInfo processStartInfo = new ProcessStartInfo()
    {
        Arguments = filenameToCreate,
        FileName = _pythonScriptToExecuteDirectly,
    };

    using (Process process = new Process())
    {
        process.StartInfo = processStartInfo;
        process.Start();
        process.Close();
    }

    string commandLineOfExecutable = 
        Path.GetDirectoryName(Environment.GetCommandLineArgs()[0]);
    string fileCreatedByPython = 
        Path.Combine(commandLineOfExecutable, filenameToCreate);
    bool fileFound = false;

    for (int i = 0; i < 5; i++)
    {
        fileFound = File.Exists(fileCreatedByPython);
        if (fileFound)
        {
            break;
        }

        Thread.Sleep(100);
    }

    if (fileFound)
    {
        Console.WriteLine("Python created file as expected.");
    }

    else
    {
        Console.WriteLine(
            "Python did not create the file as expected.");
    }
}

Notice at the end of the previous code a delay is used to give the Python script a chance to commit the file craete. This example of polling and Thread.Sleep is not meant to be an example of production code. 

The reason it was possible to invoke ProofByFile.py directly in the previous code was because of how ProcessStartInfo was configured (see the code demarcated by boldface above). By default the UseShellExecute property of ProcessStartInfo  is set to true. This means that the previous code executed using with UseShellExecute=true.

Under the covers this means that Windows used the ShellExecute function to invoke the Python script. If UseShellExecute had been set to false then Windows would have used the CreateProcess function to attempt to invoke the script. It turns out that invoking from the Windows shell (the ShellExecute function) can run the Python script, ProofByFile.py. This is because the shell uses the "PY" file extension to look up the executable used to run said extension  which is Python.exe.

The original premise was that the C# application would receive data from the Python script using standard output from the invoked script. It is only possible for C# to access standard output from the child process when UseShellExecute=false. This is why the original C# example invoked Python.exe and not the Python script directly.

Thursday, September 1, 2016

Python: Installing on Windows

Python for Windows can be found at: Python Releases for Windows

Click on the the previous link reveals the following:


A 64-bit machine requiring Python 3.6 can download

  • Windows x86-64 web-based installer: small foot installer is downloaded and the components are pulled from the web during install. This approach requires an active Internet connection.
  • Windows x86-64 executable installer: a complete installer which when run can install Python without requiring a connection to the Internet. 

Invoking Windows x86-64 executable installer displays the following start screen:


The the bottom of the screen is a checkbox "Add Python 3.6 to PATH" which adds path required to execute the Python executable to the system PATH environment variable. Selecting this checkbox is as follows:


From the candidate install links select "Customize installation" which displays the following screen;


None of the options shown in the previous screenshot will adversely affect Windows or other development environments so they can be safely installed. Clicking on Next displays the following:


None of the Advanced Options shown in the previous screenshot will adversely affect Windows or other development environments so they can be safely selected.Notice below the the install location is in a locale accessible to all users. After selecting all Advanced Options the screen appears as follows:


Click on Install to being the installation of Python:


The previous screen is displayed while installation is in process. One installation is completed, the the following screen is displayed:



Once Python is installed the following are added to the environment variable PATH:
C:\Program Files\Python36\Scripts\;C:\Program Files\Python36\;

The PATHEXT is also updated include PY and PYW (see below):
PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.PY;.PYW

By adding the PY and PYW extensions to PATHEXT, means that Windows will treat files with these extensions like executables. To understand this consider the following file, HelloWorld.PY displayed in Python's IDLE editor:


From the command-line it is possible simply run HelloWorld.py which executes because Python.exe is found in PATH, the PY extension is treated as an executable due to PATHEXT and the PY extension is mapped to the execution of the Python.exe command:




Tuesday, August 23, 2016

ADO.NET: Reading CSV Files with ADO.NET fails when a Filename contains Special Characters

Most of my blog postings are about success meaning I demonstrate how do perform a certain task. This is a blog posting about failure, namely it is impossible to use ADO.NET to read CSV file into a DataTable if the file name has special characters. This posting is to demonstrate one of pitfalls associated with a previously discussed top ADO.NET: Reading CSV Files.

To understand the limitation of ADO.NET parting a CSV file using the Microsfot Jet OLEDB provider  (Microsoft.Jet.OLEDB.4.0) consider the following CSV file:
    AdventureWorks.Person.Person.csv

Do not mistake the file for a table in AdventureWorks using a name qualified by database, schema and database object name  (Database.Schema.DBObject). AdventureWorks.Person.Person.csv is a file residing on disk.

The code that should be able to generate a DataTable from the AdventureWorks.Person.Person.csv CSV file is as follows:

public static DataTable GetDataTableFromCSV(
                          string folderPath, string csvFileName)
{
    DataTable table = new DataTable();
    string connectionText = String.Format(
        "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='{0}'" +
             ";Extended Properties='text;HDR=Yes';",
        folderPath);
    string commandText = String.Format(
                       "SELECT * FROM [{0}]", csvFileName);

    using (OleDbConnection connection = 
                  new OleDbConnection(connectionText))
    {
        connection.Open();
        using (OleDbDataAdapter command = new 
                OleDbDataAdapter(commandText, connection))
        {
            command.Fill(table);
            command.Dispose();
        }

        connection.Close();
    }

    return table;
}

private const string _csvFilename = 
         "AdventureWorks.Person.Person.csv";

static void Main(string[] args)
{
    DataTable table;
    string executableFilename = Environment.GetCommandLineArgs()[0];
    string executablePath = 
              Path.GetDirectoryName(executableFilename);

    try
    {
        string fullyQUalifiedFilename = 
              Path.Combine(executablePath, _csvFilename);

        if (File.Exists(fullyQUalifiedFilename))
        {
            Console.WriteLine("File found on disk: " + 
                              fullyQUalifiedFilename);
        }

        table = GetDataTableFromCSV(executablePath, _csvFilename);
    }

    catch (Exception ex)
    {
        Console.Error.WriteLine("Error: " + ex.Message);
    }
}

The previous code tests using File.Exists that the file exists on disk:
    if (File.Exists(fullyQUalifiedFilename))
    {
        Console.WriteLine("File found on disk: " + 
                          fullyQUalifiedFilename);
    }

The File.Exists method in the previous code returns try as the file exists on disk.

The command passed to the OleDbDataAdapter object is as follows:
    string commandText = String.Format(
                       "SELECT * FROM [{0}]", csvFileName);

Notice the that the filename is surrounded by square brackets. The square brackets should allow the query to  handling filenames that contain special characters such as periods or spaces. The previous command text generates an exception at the line of code indicated below (command.Fill(table);):
    string commandText = String.Format(
                     "SELECT * FROM [{0}]", csvFileName);

    using (OleDbConnection connection = 
              new OleDbConnection(connectionText))
    {
        connection.Open();
        using (OleDbDataAdapter command = 
              new OleDbDataAdapter(commandText, connection))
            {
                command.Fill(table); // Exception generated here
                command.Dispose();
            }

The irony is that the follow T-SQL executes successfully when a table name contains period and a space:
    SELECT * FROM [NameWithPeriod.AndA Space]

To further test ADO.NET's accessing CSV files with the jet provider the following files will be tested:
  • FileWithA Space.csv: succeeds when surrounded by square brackets
  • FileWithA-Dash.csv: succeeds when surrounded by square brackets
  • FileWithLone.Period.csv: fails even when surrounded by square brackets
A second test will be run on the files but this time using the following command text where the CSV file name is not surrounded by square brackets:
    string commandText = String.Format(
                       "SELECT * FROM {0}", csvFileName);

When the application is running without the square brackets the following behavior is exhibited for the same CSV files:
  • FileWithA Space.csv: fails
  • FileWithA-Dash.csv: fails
  • FileWithLone.Period.csv: fails

The error generated by attempting to access the previous CSV files without using square brackets is as follows:
    Syntax error in FROM clause.

Conclusion

Clearly using square brackets around the CSV file name in the command text should be part of any project using ADO.NET to parse CSV files. The square brackets allow spaces in the filename and dashes in the filename.

Recall in an early entry the 2004 MSDN article was referenced as introducing accessing CSV files using ADO.NET and the Jet OLEDB Provider: Much ADO About Text Files. Be aware that previously referenced article does not use square brackets. As was mentioned in a previous post, the article does not surround the folder path in the connection string with single quotes. The single quotes around the folder path in the connection string and the square brackets around the CSV filename in the command text should be standard operation procedure when coding using this technology.

Accessing Source Code

The source code for this project can be found at GitHub at https://github.com/softwarepronto/Blog. The solution containing the code is ADODotNetReadingCSVFiles. The projects of note are:
  • CSVToDataTableUsingADODotNetBadFilename
  • CSVToDataTableUsingADODotNetAFewBadFilenames

Monday, August 22, 2016

Web API 2: C# Upload File using HTML, CURL and .NET Clients

This posting presents the code used on the server-side to receive a file from a web client (Web API 2, C#). Also shown will be  three clients: HTML, C#/.NET and CURL. As I've said previous, CURL is my preferred HTTP client as it is platform agnostic. An IOS developer should be able to reverse engineer any web service example written in CURL.

HTML Client

Before presenting the server code used to upload a file, consider the following web page, detault.html, which allows a file to be uploaded:



The HTML code for the page is as follows:

<!DOCTYPE html>
<html>
<head>
    <title>Test file upload</title>
    <meta charset="utf-8" />
</head>
<body>
    <form action="http://localhost:4323/API/Files"
          enctype="multipart/form-data" method="post">
        <p>
            Select a file to upload:<br>
            <input type="file" name="datafile" size="50">
        </p>
        <div>
            <input type="Submit" value="Send">
        </div>
    </form>
</body>
</html>

The page contains a form which will perform an HTTP POST to the development URL:
    http://localhost:4323/API/Files

The page contains an input control of type file (named "datafile") which is clearly visible in the screenshot. The server side code needs to receive an HTTP POST and process the file accordingly.

Web Service

A controller named Files was was implemented in the web service which means the URL used by the HTML client makes sense as the web service URL contains API\Files:
    http://localhost:4323/API/Files

The code is as follows where the method implemented is Post() corresponding POST protocol used by the HTML client:

public class FilesController : ApiController
{
    private const string _fileUploadName = "datafile";

    public int Post()
    {
        string serverFolder = HostingEnvironment.MapPath(
                                      "~/Uploads/");
        HttpFileCollection files =       
             HttpContext.Current.Request.Files;
        HttpPostedFile postedFile = files[_fileUploadName];

        if (postedFile == null)
        {
            return 0;
        }

        string destinationFilename = Path.Combine(
                    serverFolder, 
                    Path.GetFileName(postedFile.FileName));

        if (File.Exists(destinationFilename))
        {
            return 0;
        }

        postedFile.SaveAs(destinationFilename);

        return 1;
    }
}


The following code creates path to the Uploads folder such that is resides under the web services folder:
    string folder = HostingEnvironment.MapPath("~/Uploads/");

The request, System.Web.HttpContext.Current.Request, HttpRequest contains a Files collection corresponding to the file or files uploaded:
    HttpFileCollection files = 
        System.Web.HttpContext.Current.Request.Files;

The file is stored in the Files collection with key "datafile":
    private const string _fileUploadName = "datafile";

    HttpPostedFile postedFile = files[_fileUploadName];

The HttpPostedFile class contains a SaveAs method that allows the files to be saved to physical media on the server:
    postedFile.SaveAs(destinationFilename);

Curl Client

The CURL code to upload a file is as follows:
curl -X POST -Fdatafile=@FileToUpload.txt http://localhost:4323/API/Files

The -X POST command-line parameters means CURL will execute a POST message. The -F is the form data where the input is named "datafile" and is associated with a file, @FileToUpload.Txt. The @ prefix indicates a file. The file resides in the same folders as the CURL script. The FileToUpload.txt is ultimately what gets uploaded.

C# Client

The C# client that uploads a file is follows:

private const string _fileToUpload = "FileToUpload.txt";

private const string _fileUploadName = "datafile";

private const string _uploadEndPoint = @"http://localhost:4323/API/Files";

static void Main(string[] args)
{
    using (HttpClient httpClient = new HttpClient())
    {
        using (MultipartFormDataContent content = 
                         new MultipartFormDataContent())
        {
            using (FileStream stream = File.Open(
                  _fileToUpload, FileMode.Open, FileAccess.Read))
            {
                using (StreamContent streamConent = 
                                    new StreamContent(stream))
                {
                    Task taskUpload;

                    content.Add(
                      streamConent, _fileUploadName, _fileToUpload);
                    taskUpload = httpClient.PostAsync(
                          _uploadEndPoint, content);
                    taskUpload.Wait();
                    if (taskUpload.Status == 
                                TaskStatus.RanToCompletion)
                    {
                        Console.WriteLine("File uploaded");
                    }

                    else
                    {
                        Console.WriteLine("File uploaded");
                    }
                }
            }
        }
    }
}

The HTTP client class (the class that performs the actualy post) is called (appropriately) HttpClient:
    HttpClient httpClient

The variable corresponding to the web forms's controls is MultipartFormDataContent (the form content) which is declared as follows:
    using (MultipartFormDataContent content = 
                         new MultipartFormDataContent())

The physical file is opened as a FileStream and that FileStream is associated with a StreamContent instance used to associated the file as part of the Form content submitted:

    using (FileStream stream = File.Open(
                  _fileToUpload, FileMode.Open, FileAccess.Read))
    {
        using (StreamContent streamConent = 
                                    new StreamContent(stream))

When the stream content is associated with the form the Add method is invoked. The second parameter of the Add method is passed the value _fileUploadName. The data associatd with _fileUploadName is filedata which was the name of the file input control in the original HTML client demonstrated:
    content.Add(streamConent, _fileUploadName, _fileToUpload);

The HTTP POST supported by an asynchronous method within HttpClient:
    taskUpload = httpClient.PostAsync(_uploadEndPoint, content);
    taskUpload.Wait();

Source Code

The source code for this project can be found in GitHub at https://github.com/softwarepronto/Blog:
  • C# Web Service: DemoWebService01
  • HTML Upload Gage: DemoWebClient01\default.html
  • C# file upload client: Client project
  • Curl file upload script: Tests\Local\JSON\PostAFile.bat

Friday, August 19, 2016

ADO.NET: Reading CSV Files with ADO.NET fails for .NET 4.0 using Solution Platforms x64 or Any CPU

My development machine runs 64-bit Windows 10. The current application I was adding features to is C#, .NET 4.0 configured to build with Solution Platform, Any CPU. This means that on a 64-bit machine (like my development machine) the code will run as a 64-bit executable. On a 32-bit machine the code will run as a 32-bit executable. The feature I added was the ability to read a CSV file from disk and manipulate the data using a DataTable. This is the same functionality presented in "ADO.NET: Reading CSV Files."

The code (show below) generates an exception at the line connection.Open(). The message associated with the exception is:
The 'Microsoft.Jet.OLEDB.4.0' provider is not registered on the local machine.


The code which demonstrates this error but of more importance is the version of .NET the application is built with (.NET 4.0) and the Solution Platform (Any CPU). It should be noted that a Solution Platform of x64 would also generate the same exception:


There is no real solution to this issue is there is no 64-bit version of the driver for .NET 4.0 Obviously changing the platform solution to x86 will permit the code to run. That is not a solution. It is a work around. Also upgrading the code to run with 4.5 or later would fix the issue but the only reason this is an issue is that there a requirement to develop a 64-bit application using .NET 4.0.

Changing the version of .NET for the project  can be addressed using Project Properties, the Applications tab:



Changing the platform target for the project to x86 can be addressed using Project Properties, the Build tab:


The code for this application (which obviously generates an exception) can be found on Github at https://github.com/softwarepronto/Blog, The solution is named ADODotNetReadingCSVFiles and the project is named CVSToDataTableUsingADODotNet4.0.




Thursday, August 18, 2016

ADO.NET: Reading CSV Files

It is 2016 and generating CSV files is a mature technology (can it really be called a technology?). For example ADO.NET has been able to consume a CSV file and create a corresponding DataTable for at least a dozen years (see the MSDN article from March 5, 2004, Much ADO About Text Files).  This post will cover the correct way to use ADO.NET to access CSV files and transform said files into an ADO.NET DataTable.

How not to do it

The impetus for this blog posting is a project I was assigned which accessed CSV files all the wrong ways. I was taken aback recently with a 2016  era project when I saw it include code to read each line in a text file (System.IO.File.ReadAllLines) and use Array.Split to process the CSV and at the same time to do it in a flawed manner. An example of this travesty against Alan Turing and all things good is as follows:

private static DataTable ReadCSVFile()
{
    string[] rows = File.ReadAllLines(GetCSVFilename());
    string[] columnNames = null;
    DataTable table = new DataTable();

    if (rows.Length == 0)
    {
        return table; // error empty CSV file
    }

    columnNames = rows[0].Split(',');
    foreach (string columnName in columnNames)
    {
        table.Columns.Add(columnName);
    }

    object[] rowValues = new object[columnNames.Length];

    // start from 1 b/c rowindex=0 is the header row of column names
    for (int rowIndex = 1; rowIndex < rows.Length; rowIndex++)
    {
        string rowLine = rows[rowIndex];
        // flaw in code -- any string containing a column will 
        // generate an extra column
        string[] columnValues = rowLine.Split(',');

        if (columnNames.Length != columnValues.Length)
        {
          throw new Exception(
            "Split() failed because there is a comma in the data.");

        }

        for (int columnIndex = 0; 
             columnIndex < columnValues.Length; 
             columnIndex++)
        {
            rowValues[columnIndex] = columnValues[columnIndex];
        }

        table.Rows.Add(rowValues);
    }

    return table; 
}

Pay close attention to the code marked in boldface which is a core flaw in how the CSV values are parsed. This bug is the reason it is better to use established technologies. The  bug show up at line 291 due this text data that contains a column:
  For urgent issues, do not send e-mail

How to access a CSV file using ADO.NET's OLE DB Provider

This article will demonstrated a cleaner way to convert a CSV file into a DataTable. The previously referenced MSDN article that demonstrates CSV file to DataTable code is actually flawed with respect to how connections to CSV files are handled and how the command text related to accessing values is written.  The example presented here will address these issues and said example is as follows:

public static DataTable GetDataTableFromCSV(string folderPath, 
                                            string csvFileName)
{
    DataTable table = new DataTable();
    string connectionText = String.Format(
        "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='{0}';" 
          "Extended Properties='text;HDR=Yes';",
        folderPath);
    string commandText = String.Format(
        "SELECT * FROM [{0}]", csvFileName);

    using (OleDbConnection connection = 
                  new OleDbConnection(connectionText))
    {
        connection.Open();
        using (OleDbDataAdapter command = 
             new OleDbDataAdapter(commandText, connection))
        {
            command.Fill(table);
            command.Dispose();
        }

        connection.Close();
    }
}

The connection text in the previous code includes the folder in which the CSV file resides. The folder path is assigned to the Data Source property of the connection string as follows:
  "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='{0}';" 
    "Extended Properties='text;HDR=Yes';",

Notice that the {0} String.Format parameters is bracketed by single quotes: '{0}'. This single quotes protect the code from any special characters in the folder path that would cause an error in the connection string.

The command text is a SQL SELECT where the CSV filename is the value specified for the FROM clause:
  "SELECT * FROM [{0}]"

Notice that the {0} String.Format parameters is bracketed by square brackets. In a perfect would this would protect the command execution from special characters in the filename.

Accessing the Source Code

Each project used in this application can be found in Github for user SoftwarePronto (https://github.com/softwarepronto/Blog). The solution associated with blog is:

  • ADODotNetReadingCSVFiles

The pertinent projects are:

  • BarbaricCSVAccess
  • CSVToDataTableUsingADODotNet 


Each project includes a sample CSV file that is part of the project and is copied to the build folder. The settings associated with supporting thsi were discussed in a previous posting, "Visual Studio Installer Projects: Installing PDFs and other Content with an Install Project".