Submitted By:
(Signature)
Acknowledgment
Then I would like to thank Mr. Charan Reddy for his enthusiastic approach
and dedication which has indeed been a great source of inspiration and
support to us. He deserves a real acknowledgment at this juncture.
TABLE OF CONTENTS
1. Introduction...........................................................06 2.
Objectives......................................................07
3.Methods and Concepts.........................................10
4.Results and Discussions........................................12
5.Conclusion......................................................16
6.Future Perspective.............................................16
7.References......................................................17
Introduction
Though natural language processing tasks are closely intertwined, they are frequently
subdivided into categories for convenience. A coarse division is given below.
Grammar induction[13]
As NLP deals with processing of languages our project motto is to detect type of
Programming language that a given code belongs to.
The code may be of any programming language and it was identified by the Built in
stack_overflow dataset
Objectives
The main objective was to recognize different programming languages that the given input code
belongs to.
The goal of natural language processing (NLP) is to design and build computer systems that are
able to analyze natural languages like German or English, and that generate their outputs in a
natural language, too. Typical applications of NLP are information retrieval, language
understanding, and text classification. The development of statistical approaches for these
applications is one of the research activities at Lehrstuhl für Informatik 6.
Information retrieval (IR) deals with the representation, storage, organization of, and access to
information items. Given a query the goal is to extract a subset of documents from a large data
collection that satisfies a user's information need. Besides written texts the database may also
contain multimedia documents, e.g. audio and video data.
In natural language understanding, the objective is to extract the meaning of an input sentence
or an input text. Usually, the meaning is represented in a suitable formal representation language
so that it can be processed by a computer.
The goal in text classification is to assign a text document to one out of several text classes. For
newspaper articles, such classes are sports reports, finances, and politics.
Information Retrieval
Natural Language Understanding
Spoken Dialogue Systems
Text Classification and Clustering
Methods and Concepts
The methods and concepts used in this project are briefly explained below.
The concept of pandas , keras and its preprocessing models are used.
train : It is used to train the dataset given,attributes with common features will be
grouped together.
Numpy is the fundamental package for scientific computing with python.
The project starts by importing the required modules from the anaconda as
mentioned below
Import pandas as pd
This command imports the package pandas likewise other packages are also
imported
Next step we have to download the stack_overflow dataset from the provided URL
We have to specify the path of the dataset downloaded and convert it to csv form
data.to_csv(stack-overflow-data.csv)
By this ,the dataset will be converted to csv format,we can look the head of the data
by
df.head(),it will display the head portion of the data in the post and tag format
Df.post() the given dataset contains posts and tags, the given command will give the
posts that are found in the dataset.
Tkinter :this was a python’s de-facto standard GUI package.It is a thin object
oriented layer on top of Tcl/Tk.
def predict():
print("Prediction on progress..")
entered_input=E1.get()
print("Entered Input",entered_input)
sample=tokenize.texts_to_matrix([entered_input])
k=model.predict_classes(new)
#entered_input=cv.transform([entered_input])
#y_pred=model.predict(entered_input)
print(k)
l=encoder.classes_[k]
L2 = Label(top, text="Prediction: "+l)
L2.pack()
top.mainloop()
print(k)
The above predict() function will give the required output i.e the type of code given
Tokenization() :it was used to convert a paragraphs to sentences and sentences to
words,finally words are converted to literals.
sample=tokenize.texts_to_matrix([entered_input]) :this was used to make a
matrix form with the converted words.
RESULTS
As of given the model should process the input and have to predict the type of
programming language that a given input belongs to.
Natural Language Processing (NLP) was useful for the human-machine interaction
As machine can only understand the binary language NLP convenrts the given input
Into binary form and sends it to the system, according input the system will generate
the output .
The output was then coverted to user understanble format by the NLP.this process
was very useful and commonly implemented .The conclusion of the project is to
Determine/predict the exact programming language that a given input code belongs.
FUTURE PERSPECTIVE
1.Practical implications
– A limited literature is available on the classification of maintenance optimization
models and on its associated case studies. The paper classifies the literature on
maintenance optimization models on different optimization techniques and based on
emerging trends it outlines the directions for future research in the area of maintenance
optimization.
2.information extraction purpose
3.industry monitoring
4.educational perspective
REFERENCES
1.stack-overflow-data:## https://storage.googleapis.com/tensorflow-workshop-
examples/stack-overflow-data.csv
2.Hands-on Data Science With Anaconda by James Yan,Dr.yuxing Yan
3.Data analysis with numpy and pandas by curtis miller