Import Spam Filtering Dataset in python | Tutorial No: 1. We are going to start a series for SPAM Detection Tutorial for NLP. There are many ways to implement spam detection methodology. But the method discussed here are very easy to understand.
In this series of tutorials you will learn that How to import Dataset in python? For Spam Detection Procedure as well as the following aspects in spam detection.
How to import nltk in python?
How to import sklearn in python?
How to import matplotlib.pyplot as plt in python?
How to import csv in python?
How to import numpy as np in python?
How to import re in python?
How to import pandas in python?
How to import matplotlib.pyplot in python?
How to import csv in python?
How to import wordcloud in python?
How to import seaborn in python?
How to import string in python?
How to import regex in python?
# Importing Mandatory Libraries
For Installing different libraries in python, Anaconda or Jupyter Notebook you have to run the following command in python shell, Anaconda command line and Jupyter Notebook.
>>>import nltk
>>>import sklearn
>>>import matplotlib.pyplot as plt|
>>>import csv
>>>import numpy as np
>>>import re
>>>import pandas as pd
>>>import matplotlib.pyplot as plt
>>>import csv
>>>import wordcloud
>>>import seaborn
>>>import string
>>>import regex
# Checking Current Working Directory in python.
>>>import os
>>>os.getcwd()
OutPut: ‘C:UsersMuhammadAhmadjupyter using python’
# Import Spam Filtering Dataset in python
>>>smsspam = pd.read_csv(‘SMSSpamCollection’, sep=“t”, header=None)
>>>smsspam.head()
# Changing the Labels Name of Dataset
>>>smsspam.columns = [‘label’,’sms’]
>>>smsspam.head()
# Checking Dataset Details
#View the details(Length, Number of “HAM” or “SPAM” messages, Number of row and columns as well as missing label messages.) for smsspam Dataset.
>>>print(f’input data has {len(smsspam)} rows, {len(smsspam.columns)} columns’)
>>>print(f’ham = {len(smsspam[smsspam[“label”] == “ham”])}’)
>>>print(f’spam = {len(smsspam[smsspam[“label”] == “spam”])}’)
>>>print(f” number of missing label = {smsspam[‘label’].isnull().sum()}”)
>>>print(f” number of missing msg = {smsspam[‘sms’].isnull().sum()}”)
Outputs
input data has 5572 rows, 2 columns
ham = 4825
spam = 747
number of missing label = 0
number of missing msg = 0