Geeks of Coding

Join us on Telegram

Basics of Natural Language Processing / #NLP and Python Implementation

The Bag of Words aka BOW is a simplifying representation used in Natural Language Processing (NLP) and Information Retrieval (IR). In this model, a text word or a whole sentence is represented as a multi-set of its words which is known as Bag, disregarding any grammar or language rules and regulations even the sentence sequences but counting the occurrence of word appeared how many times. This model has also been used for Computer Vision techniques.

Bags of Words, NLP

The Bag of Words most commonly used in methods of document classifications where the repetition or occurrence of each word is used as the feature for training the classifier.

Let’s look at an example:

There are three sentences as given below: –

  1. He is a good boy.
  2. She is a good girl.
  3. Boy and Girl are good.

So, in the above sentences is/are/am etc. are irrelevant features so we’ll not take them into our consideration. Now, we will calculate the frequency of each important words:


So, by this table, we’ll use vectorization to calculate the features of every given sentence: –

Here, we are just pointing out the bag/multi sets of words and their occurrence in the respective sentence given above 1, 2, 3 respectively.


After, these feature calculations we can have a new output table that can be used for NLP data processing.

Machine Learning Application Implementation:


Applications/Advantages: –

  • Useful for spam filtrations.
  • Handle small data easily.
  • Easy to implement.

Disadvantages: –

  • Can’t prioritize the important words.
  • Every word has the same value.
  • Doesn’t apply to big data.

Abhishek Tyagi

Abhishek Tyagi

Currently, A Mechatronics Engineer, Machine learning and deep learning enthusiast. And love to research on various topics.

Recommended Articles


Leave A Comment