1.1. INTROCUTION 1
2.1. EDUACTION 3
2.1.1. K12 3
2.2. MANUFACTURING 4
2.2.2. PHARMACEUTICAL 4
4. SCHEDULE-WEEK WISE 7
5.1. STATISTICS 8
5.1.1. INFERENTIAL STATISTICS 9
5.2. PYTHON 13
6.1.INTRODUCTION 27
6.3.DATA EXPLORATION
30
7.CONCLUSION 35
REFERENCES 36
CHAPTER 1
1.1. INTRODUCTION
TCS iON is a strategic unit of Tata Consultancy Services focused on Manufacturing
Industries (SMB), Educational Institutions and Examination Boards. TCS iON provides
technology by means of a unique IT-as-a-Service model, offering end-to-end business
solutions. It caters to the needs of multiple industry segments, through innovative, easy-to-
use, secured, integrated, hosted solutions in a build-as-you-grow, pay-as-you-use business
model. TCS iON serves its clients with the help of best practices gained through TCS' global
experience, domestic market reach, skills, and delivery capabilities. TCS iON's Cloud Based
Solution is highly modular, scalable and configurable giving businesses and educational
institutions the benefits of increased efficiencies, faster go to market, predictability of
technology as well as spend and better business results.
1.1.1.INTEGRATED SOLUTIONS
1.1.2.INCREASED AGILITY
TCS iON brings in the agility to keep pace with changing processes or a new line of
business. They help you configure the processes to work as you currently do or the software
recommends and allows you to choose industry best practices based on your business
parameters. TCS iON gives you increased convenience allowing you to perform various tasks
from your mobile device, no matter where you are. Being automatically compliant with
Although TCS iON is a cloud service for education, exam boards and
manufacturing, the software is configurable to each sector. You will always get the
flavour of your business by picking and choosing what processes you would need.
Furthermore, the multilingualcapability of the solution allows you to customize the
solution label names to read invernacular languages (like Hindi, Marathi, Tamil etc)
enabling users to learn and operate the solution with ease.
CHAPTER 2
2.1. EDUCATION
2.1.1. K12
2.2. MANUFACTURING
2.2.1.AUTOMOTIVE COMPONENT
Automotive component manufacturers need to improve operational
efficiencies and implement best-in-class shop floor practices- to meet high quality
standards and ensure faster turn-around and JIT delivery. These parameters need to be
met even while working on high volumes.
The iON Manufacturing Solution for the automotive component industry helps:
2.2.2.PHARMACEUTICAL
The pharmaceutical industry in India is faced with challenges such as cheap
imports and increased scrutiny from US FDA. As a result, the industry requires a
balanced mix of imported and domestically purchased active pharmaceutical
ingredients (API), efficient supply chain management and on-going R&D. The TCS
iON Manufacturing Solution is a domain-centric, analytics-driven cloud solution
which offers offers an integrated view of operations acrossthe organization.
The iON Manufacturing Solution for the pharmaceutical industry helps
CHAPTER 3
CHAPTER 4
SCHEDULE-WEEK WISE
I. Differential Statistics
II. Inferential Statistics
I. Python Basics
II. Data Types
III. Python Libraries
I. Linear Regression
II. Logistic Regression
III. Decision Tree
IV. Random Forest
V. Clustering
4. Week 4 Mini-Project
CHAPTER 5
5.1. STATISTICS
Statistics is the science of collecting, organizing, analysing, and interpreting
data in order to make decisions.
A “sample” was defined as the portion of a population that has been selected
for analysis. Rather than selecting every item in the population, statistical sampling
procedures focus on collecting a small representative group of the larger population
SAMPLING METHODS-
It can be observed from the above graph that the distribution is symmetric
about its center, which is also the mean (0 in this case). This makes the probability of
events at equal deviations from the mean, equally probable. The density is highly
centered around the mean, which translates to lower probabilities for values away
from the mean.
The probability density function of the general normal distribution is given as:
In the above formula, all the symbols have their usual meanings, 𝜎 is the Standard
Deviation and µ is the Mean.It is easy to get overwhelmed by the above formula
while trying to understand everything in one glance, but we can try to break it down
into smaller pieces so as to get an intuition as to what is going on.
The exponent of e in the above formula is the square of the z-score times -1/2.
This is actually in accordance to the observations that we made above. Values away
from the mean have a lower probability compared to the values near the mean. Values
away from the mean will have a higher z-score and consequently a lower probability
since the exponent is negative. The opposite is true for values closer to the mean.
This gives way for the 68-95-99.7 rule, which states that the percentage of
values that lie within a band around the mean in a normal distribution with a width of
two, four and six standard deviations, comprise 68%, 95% and 99.7% of all the
values. The figure given below shows this rule-
We know that 95% of the values lie within 2 (1.96 to be more accurate)
standard deviation of a normal distribution curve. So, for the above curve, the blue
shaded portion represents the confidence interval for a sample mean of 0.
Points to be noted:
We cannot accept the Null hypothesis, only reject it or fail to reject it.
As a practical tip, Null hypothesis is generally kept which we want to
disprove.
For e.g. You want to prove that students performed better after taking extra
classes on their exam. The Null Hypothesis, in this case, would be that the
marks obtained after the classes are same as before the classes.
The region of rejection consists of values of the test statistic that are unlikely
to occur if the null hypothesis is true. These values are more likely to occur if the null
hypothesis is false. Therefore, if a value of the test statistic falls into rejection region,
you reject the null hypothesis because that value is unlikely if the null hypothesis is
true.
5.2. PYTHON
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer syntactical
constructions than other languages.
Easy-to-read − Python code is more clearly defined and visible to the eyes.
A broad standard library − Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
Portable − Python can run on a wide variety of hardware platforms and has
the same interface on all platforms.
Apart from the above-mentioned features, Python has a big list of good features, few
are listed below −
It provides very high-level dynamic data types and supports dynamic type
checking.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
The data stored in memory can be of many types. For example, a person's age
is stored as a numeric value and his or her address is stored as alphanumeric
characters.
Python has various standard data types that are used to define the operations
possible on them and the storage method for each of them.
1. Numbers
2. String
3. List
4. Tuple
5. Dictionary
Number data types store numeric values. Number objects are created when
you assign a value to them. For example −
var1 = 1
You can also delete the reference to a number object by using the del statement. The
syntax of the del statement is –
del var1[,var2[,var3[....,varN]]]]
You can delete a single object or multiple objects by using the del statement. For
example −
del var
The plus (+) sign is the string concatenation operator and the asterisk (*) is the
repetition operator.
5.2.5.PYTHON LISTS-
Lists are the most versatile of Python's compound data types. A list contains
items separated by commas and enclosed within square brackets ([]). To some extent,
lists are similar to arrays in C. One difference between them is that all the items
belonging to a list can be of different data type. The values stored in a list can be
accessed using the slice operator ([ ] and [:]) with indexes starting at 0 in the
beginning of the list and working their way to end -1. The plus (+) sign is the list
concatenation operator, and the asterisk (*) is the repetition operator.
A tuple is another sequence data type that is similar to the list. A tuple
consists of a number of values separated by commas. Unlike lists, however, tuples
are enclosed within parentheses.
The main differences between lists and tuples are: Lists are enclosed in brackets ( [ ]
) and their elements and size can be changed, while tuples are enclosed in
parentheses ( ( ) ) and cannot be updated. Tuples can be thought of as read-
only lists.
5.2.7.PYTHON DICTIONARY-
Python's dictionaries are kind of hash table type. They work like associative
arrays or hashes found in Perl and consist of key-value pairs. A dictionary key can be
almost any Python type, but are usually numbers or strings. Values, on the other
hand, can be any arbitrary Python object.
Dictionaries are enclosed by curly braces ({ }) and values can be assigned and
accessed using square braces ([]).
Data Analytics
5.5.MACHINE LEARNING
Gathering past data in the form of text file, excel file, images or audio data.
The better the quality of data, the better will be the model learning.
Data Processing – Sometimes, the data collected is in the raw form and it
needs to be rectified.
Example: if data has some missing values, then it has to be rectified. If data is in
Building up models with suitable algorithms and techniques and then training
it.
Testing our prepared model with data which was not feed in at the time of
training and so evaluating the performance – score, accuracy with high level of
precision.
For instance, suppose you are given a basket filled with different kinds of
fruits. Now the first step is to train the machine with all different fruits one by one like
this:
If shape of object is rounded and depression at top having color Red then
it will be labelled as –Apple.
Now suppose after training the data, you have given a new separate fruit say
Banana from basket and asked to identify it.
2. UNSUPERVISED LEARNING:
For instance, suppose it is given an image having both dogs and cats which
have not seen ever.Thus, machine has no any idea about the features of dogs and cat
so it can’t categorize dogs and cats. But it can categorize them according to their
similarities, patterns and differences i.e., we can easily categorize the above picture
into two parts. First may contain all pics having dogs in it and second part may
contain all pics having cats in it. Here you didn’t learn anything before, means no
training data or examples.
Figure 5.11:Neuron
Basically, a neuron takes an input signal (dendrite), processes it like the CPU
(soma), passes the output through a cable like structure to other connected neurons
It may be divided into 2 parts. The first part, g takes an input (dendrite),
performs an aggregation and based on the aggregated value the second part, f makes a
decision.
We can see that g(x) is just doing a sum of the inputs — a simple aggregation.
And theta here is called thresholding parameter. For example, if I always watch the
game when the sum turns out to be 2 or more, the theta is 2 here. This is called the
Thresholding Logic.
Now, this is very similar to an M-P neuron but we take a weighted sum of the
inputs and set the output as one only when the sum is more than an arbitrary threshold
(theta). However, according to the convention, instead of hand coding the thresholding
parameter theta, we add it as one of the inputs, with the weight -theta like shown
below, which makes it learn-able.
Here, w_0 is called the bias because it represents the prior (prejudice). A
football freak may have a very low threshold and may watch any football game
irrespective of the league, club or importance of the game [theta = 0]. On the other
hand, a selective viewer like me may only watch a football game that is a premier
league game, featuring Man United game and is not friendly [theta = 2]. The point is,
the weights and the bias will depend on the data (my viewing history in this case).
Based on the data, if needed the model may have to give a lot of importance (high
weight) to the is ManUnitedPlaying input and penalize the weights of other inputs.
CHAPTER 6
6.1. INTRODUCTION
Introduction In today's era, the impact of the social networking media such as
Facebook, Google Plus, YouTube, Blogs, and Twitter is increasing rapidly day by
day. Millions of people are connected with each other on social networking sites and
express their sentiments and opinion through tweets, and comments . This motivates
the automatic mining and classification of views, emotions, opinions, and feeling of
people on social networking websites. Sentiment analysis is the process of analyzing
the data in order to extract sentiment or opinions. It is also known as subjectivity
analysis, opinion mining and sentiment classification. An example of sentiment
analysis is Experiencing US Airlines. Before getting into a airlines, a customer read
reviews about that airlines services. With the help of sentiment analysis, customers
can find opinions of other people, whether they are satisfied or not with the quality of
the services. Sentiment analysis is a type of natural language processing task that
tracks the views of people about a certain thing or topic and categorizes these views
into two classes i.e. positive and negative. In positive class, positive opinion of the
1. Retrieval of tweets :As twitter is the most exaggerated part of social networking
site, it consists of various blogs which are related to various topics worldwide. Instead
of taking whole blogs, we will rather search on particular topic and download all its
web pages then extracted them in the form of text files by using mining tool i.e. Weka
which provides sentiment classifier.
A. Filtering
B. Tokenization
C. Removal of stopwords
Toolkits used
We can observe that the data set contain 14640 rows and 15
columns.'airline_sentiment' is the column we are going to predict , which says
positive , negative and neutral.
We take only the tweets we are very confident with. We use the BeautifulSoup library
to process html encoding present in some tweets because scrapping.
We are going to use cross validation and grid search to find good hyperparameters for
our SVM model. We need to build a pipeline to don't get features from the validation
folds when building each training model.
CHAPTER 7
CONCLUSION
REFERENCES
[1] https://www.kaggle.com/learn/deep-learning
[2] http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29
[3] https://github.com
[4] https://padhai.onefourthlabs.in