Sentiment Analysis Example with ML.NET in C#

   As you may already know Microsoft ML.NET is an open source machine learning framework for .NET developers. ML.NET provides various machine learning models to solve classification, regression and other types of problems in data analysis. In this post, we'll learn how to classify sentiment polarity with a binary classification model of ML.NET in C#. The tutorial covers:
  1. Preparing the tools
  2. Preparing data
  3. Building the model
  4. Predicting sentiment texts

Preparing the tools

   We'll create a new console application (.NET Framework) project in Visual Studio. Then install Microsoft.ML packages from Nuget package manager. Here in my machine, I use Visual Studio with 4.7.1 .NET framework and Microsoft.ML 0.11 version. Since many updates are coming with ML.NET, make sure that you are using the right one.



We'll include the required namespaces.

using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Linq;


Preparing data

   I prepared a simple sentiment text for this tutorial to train the model. It is imaginary users opinion that positive opinion labeled '1' and negative opinion with '0'. It is a tab separated text file with a binary label and sentiment text. The below is sample content of sentiment training data.

Label Text
1 exciting show
1 amazing performance
1 it is great!
1 I am excited a lot
1 it is terrific
1 Definitely good one
1 Excellent, very satisfied
1 Glad we went
1 Once again outstanding!
1 awesome! excellent show
1 This is truly a good one!
0 it's mediocre!
0 Not good at all!
0 It is rude
0 I don't like this type
0 poor performance
0 Boring, not good at all!
0 not liked
0 I hate this type of things
... 


You can find the full list of above sentiment text in below. Copy the text and save it as a SentimentText.tsv on your target folder. When you save the sentiment content, make sure that the Label and Text columns are separated by a tab.

private static readonly string DataPath = @"C://tmp/SentimentText.tsv";


Building the model

First, we need to create sentiment data and prediction container classes.

public class SentimentIssue
{
    [LoadColumn(0)]
    public bool Label { get; set; }
    [LoadColumn(1)]
    public string Text { get; set; }
}

public class SentimentPrediction
{
    [ColumnName("PredictedLabel")]
    public bool Prediction { get; set; }
    public float Probability { get; set; }
    public float Score { get; set; }
} 

We'll start by creating MLContext.

var mlContext = new MLContext(seed: 1);

Then, we'll read the sentiment text file and transform it into the mlContext.

var data = mlContext.Data.LoadFromTextFile(
    path: DataPath,
    hasHeader: true,
    separatorChar: '\t');

var dataProcessPipeLine = mlContext.Transforms.Text
    .FeaturizeText(outputColumnName: DefaultColumnNames.Features,
    inputColumnName: nameof(SentimentIssue.Text));

We use StochasticDualCoordinateAscent model for binary classification.

var trainingPipeLine = dataProcessPipeLine
    .Append(mlContext.BinaryClassification
                        .Trainers.StochasticDualCoordinateAscent());

We'll check the model with cross-validation and get accuracy.

var cvResults = mlContext.BinaryClassification
    .CrossValidate(data, estimator: trainingPipeLine, numFolds: 5);

var aucs = cvResults.Select(r => r.Metrics.Auc);
var accs = cvResults.Select(r => r.Metrics.Accuracy);

Finally, train fit the model.

var model = trainingPipeLine.Fit(data);


Predicting sentiment text

We can predict new sentiment data as shown below.

var predEngine = mlContext.Model
                .CreatePredictionEngine(model);
var resultprediction = predEngine.Predict(item);


Here, I've tested the model with below sentiment texts.

var opinions = new List
{
    new SentimentIssue {Text = "This is an awful!"},
    new SentimentIssue {Text = "This is excellent!"},
    new SentimentIssue {Text = "I like it!"},
    new SentimentIssue {Text = "don't like this one"},
};

And the result was as a following.



The result looks better. We can improve our model by adding more train data.
In this post, we've learned how to classify sentiment data with ML.NET in C#.
The full source code and test sentiment text are listed below.

using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Linq;

namespace SentimentDT
{
    public class SentimentIssue
    {
        [LoadColumn(0)]
        public bool Label { get; set; }
        [LoadColumn(1)]
        public string Text { get; set; }
    }

    public class SentimentPrediction
    {
        [ColumnName("PredictedLabel")]
        public bool Prediction { get; set; }
        public float Probability { get; set; }
        public float Score { get; set; }
    }

    class Program
    {        
        private static readonly string DataPath = @"C://tmp/SentimentText.tsv";

        static void Main(string[] args)
        {
            var mlContext = new MLContext(seed: 1);
            var sentimentModel = BuildSentimentModel(mlContext);

            var opinions = new List
            {
                new SentimentIssue {Text = "This is an awful!"},
                new SentimentIssue {Text = "This is excellent!"},
                new SentimentIssue {Text = "I like it!"},
                new SentimentIssue {Text = "don't like this one"},
            };

            PredictSentiment(mlContext, sentimentModel, opinions);

            Console.ReadKey();
        }                    

        private static ITransformer BuildSentimentModel(MLContext mlContext)
        {
            var data = mlContext.Data.LoadFromTextFile(
                path: DataPath,
                hasHeader: true,
                separatorChar: '\t');

            var dataProcessPipeLine = mlContext.Transforms.Text
                .FeaturizeText(outputColumnName: DefaultColumnNames.Features,
                inputColumnName: nameof(SentimentIssue.Text));

            var trainingPipeLine = dataProcessPipeLine
                .Append(mlContext.BinaryClassification
                                 .Trainers.StochasticDualCoordinateAscent());
                      
            var cvResults = mlContext.BinaryClassification
                .CrossValidate(data, estimator: trainingPipeLine, numFolds: 5);

            var aucs = cvResults.Select(r => r.Metrics.Auc);
            var accs = cvResults.Select(r => r.Metrics.Accuracy);
            var model = trainingPipeLine.Fit(data);
            Console.WriteLine("Model accuracy info:");
            Console.WriteLine($"Accuracy: {accs.Average()},AUC: {aucs.Average()}");
            return model;
        }

        private static void PredictSentiment(MLContext mlContext, 
            ITransformer model, List texts)
        {
            var predEngine = mlContext.Model
                .CreatePredictionEngine(model);

            Console.WriteLine("\nText     | Prediction | Positive probability");
            foreach (var item in texts)
            {
                var resultprediction = predEngine.Predict(item);
                var predSentiment = Convert
                    .ToBoolean(resultprediction.Prediction) 
                                      ? "Positive" : "Negative";

                Console.WriteLine("{0} | {1} | {2}", 
                    item.Text, predSentiment, resultprediction.Probability);
            }
        }
    }
}

SentimentText.tsv file content.

Label Text
1 I like it
1 like it a lot
1 It's really good
1 Recommend! I really enjoyed!
1 It's really good
1 recommend too
1 outstanding performance
1 it's good! recommend!
1 Great!
1 really good. Definitely, recommend!
1 It is fun
1 Exceptional! liked a lot!
1 highly recommend this
1 fantastic show
1 exciting, liked.
1 it's ok
1 exciting show
1 amazing performance
1 it is great!
1 I am excited a lot
1 it is terrific
1 Definitely good one
1 Excellent, very satisfied
1 Glad we went
1 Once again outstanding!
1 awesome! excellent show
1 This is truly a good one!
0 it's mediocre!
0 Not good at all!
0 It is rude
0 I don't like this type
0 poor performance
0 Boring, not good at all!
0 not liked
0 I hate this type of things
0 not recommend, not satisfied
0 not enjoyed, I don't recommend this.
0 disgusting movie
0 waste of time, poor show
0 feel tired after watching this
0 horrible performance
0 not so good
0 so boring I fell asleep
0 a bit strange
0 terrible! I did not expect.
0 This is an awful
0 Nasty and horrible!
0 Offensive, it is crap!
0 Disappointing! not liked.

Reference:
  1. Use ML.NET in a sentiment analysis binary classification scenario
  2.  ML.NET Cookbook

No comments:

Post a Comment