Phone

+919997782184

Email

support@roboticswithpython.com

Geeks of Coding

Join us on Telegram

Basics of Natural Language Processing / #NLP and Python Implementation

The Bag of Words aka BOW is a simplifying representation used in Natural Language Processing (NLP) and Information Retrieval (IR). In this model, a text word or a whole sentence is represented as a multi-set of its words which is known as Bag, disregarding any grammar or language rules and regulations even the sentence sequences but counting the occurrence of word appeared how many times. This model has also been used for Computer Vision techniques.

Bags of Words, NLP

The Bag of Words most commonly used in methods of document classifications where the repetition or occurrence of each word is used as the feature for training the classifier.

Let’s look at an example:

There are three sentences as given below: –

  1. He is a good boy.
  2. She is a good girl.
  3. Boy and Girl are good.

So, in the above sentences is/are/am etc. are irrelevant features so we’ll not take them into our consideration. Now, we will calculate the frequency of each important words:

WordsFrequency
Good3
Boy2
Girl2
Description

So, by this table, we’ll use vectorization to calculate the features of every given sentence: –

Here, we are just pointing out the bag/multi sets of words and their occurrence in the respective sentence given above 1, 2, 3 respectively.

FeatureF1F2F3
SentencesGoodBoyGirl
1110
2101
3111

After, these feature calculations we can have a new output table that can be used for NLP data processing.

Machine Learning Application Implementation:

Untitled

Applications/Advantages: –

  • Useful for spam filtrations.
  • Handle small data easily.
  • Easy to implement.

Disadvantages: –

  • Can’t prioritize the important words.
  • Every word has the same value.
  • Doesn’t apply to big data.

Recommended Articles

2 Comments

Leave A Comment