Response Paper On Crowdsourcing Dialect Characterization Through Twitter

Response Paper on Crowdsourcing Dialect Characterization through Twitter
1.0 Introduction
This research article by Goncalves and Sanchez from Aix-Marseilla University,
France has grabbed my attention the most when it comes to personalising dialect. Discussing
about dialect inspires about the persons geographical origin where people who are living in
different region could be using various words that talks about one specific thing. Linguists
are fascinated with dialects when they reveal social class, pattern of immigration and how
groups have influence each other in the past. In this research, Spanish is being chosen to be
the language research as it is the one of the most spoken language in the world and spatially
distributed over several continents.
Using crowdsourcing is one of the fastest and reliable resources in collecting data.
Even though the researchers took two to three consecutive years to gather and analyse data,
the finding had been using language detector from the Chromium Compact Language
Detector software library and Twitter garden hose to gather and locate the unbiased sample
from Twitter users. As for the finding, there are two superdialects in Spanish uncovered and
the researchers come out with a list of corpus in Spanish varieties characterization which is
quite interesting to be learned and acknowledged.
2.0 Research review
2.1 Research Elements
The elements in this research article which I prefer is the abstract where they
compressed the research into one comprehensive and compact summary of the whole and
provide understanding which get the readers to understand the entire research is all about.
Nevertheless, this article is lacking some important sub headings as to retain their
effectiveness and originality to the readers. Under the Methodology heading, there should be
three components to get into a detail research that are Study Sample, Data Collection and
Measures. By having these three components, this article would be understandable in a way
of consistency and systematically written. The confusion arises within the readers as we have
to keep the important points in mind whether the researchers already mention it or not.
Though, the article has combined Result and Discussion in one heading and divided into
subheadings to focus on the individual points or results. Interestingly, the researchers also
claim that this research has in a way offered much further and detail research to be done as
they had a large dataset to be analysed in many forms of related sociolinguistic study. They
also welcome and appreciate the latest and modern technology that has contributed to a more
constant finding.
In addition, this research had been posted in technologyreview.com website to
promote the use of computational linguistics study which is the study of computer processing,
understanding, and generation of human languages. It is often regarded as a subfield of
artificial intelligence. Techniques from computational linguistics are used in applications
such as machine translation, speech recognition, information retrieval, intelligent Web
searching, and intelligent spelling checking. This current simplified development of study is
one of the latest usage in linguistics study including this particular research from Goncalves
and Sanchez
2.2 Related Work
While this research is claimed to be the first study of dialect on Twitter that reveals
global pattern in global scale, still the researcher should provide the reader to a related scope
which also been studied before. There was the most famous American study performed by
Hans Kurath in the second quarter of this century, and covered most of the east a quarter of
the United States. Though Kurath uses the traditional method by interviewing and travelling
to various locales, what Kurath and all dialectologists looked for is the isoglosses (iso=same
gloss=speech) boundaries separating regions of a country which uses different words or
constructions to describe the same things. He found in some parts of the country that the
isoglosses for several unrelated words fell in practically the same locations, forming bundles
of isoglosses. These bundles were significant discoveries, as they indicated the existence of a
real correlation between speech patterns and region. They also provided a living linguistic
reminder of the patterns of migration of Americans moving to Westward. Nevertheless, the
significant in this current research from Goncalves and Sanchez is the use of reliable
resources with millions data involves without having to get up from the seat to travel and face
the people. Their initial means are to promote computational linguistics study with another
language dialect to be highlighted where in this case Spanish.
The researchers are mainly initiating their research to be upgraded from the previous
time; the traditional methodology would not be suitable and appropriate to build a concrete
finding. They have come out with the latest version and modern updated technology to help
the research being done which is also provided faster result that include the participation from
each and every community lives in that particular area using the GPS system. Furthermore,
they want to broaden their research area to be more specifically entertained the data by the
individuals to use their vernacular language freely as in the medium that they choose is the
social network. (i.e. Twitter).
While the researchers focusing on their aim of the study, they would probably choose
any language to be the centralised. However, at first I was surprised as I would like to see the
result from an English language study but they have chosen Spanish instead of English to be
the selected language. Nonetheless, there were many studies had been done in English
language so it is in a way a new disclosure to me in learning Spanish as well. Theoretically,

even though English and Mandarin have the most native speaker on the globe, both the
languages are not in suitable for dialect research when Mandarin has limited local access to
Twitter and English will face a complex lexicographic analysis because of abundance of
homograph that could be confusing in initial meaning. Finally, Spanish is the most suitable
language for it has vast population of the speakers around the globe and the researchers have
made a practical choice to provide the study with a non-biased language where I would find it
a reasonable and rational decision to make.
2.3 Methodology
Apart from that, the measures written seems to be heavily cramped by matrix and
statistical symbols as the way the researchers convey their words are quite complicated
however it is quickly hoarded by the help of figures and maps that could be one of the easiest
ways to look upon and clearly stated the highlighted concepts. To some point, readers should
have some background knowledge of statistic to clearly understand on how the calculation
works. Basically, by using maps to explain the geographical sites makes the readers easier to
relate the finding as images would be a good aid in projecting what the researchers had come
about. The plotted, coloured map also gives a clear view of the words distribution commonly
used in certain part of the world by segmenting into colours. However, to simplify the data,
they should most likely summarise the findings into a table where it gives clearer view to
identify the countries or regions that are demonstrating different geographic dialects.
According to socialresearchmethods.net, the table should be concisely formatted while the
measures should be briefly explained with appropriate citation and references. The complete
measures are supposedly to be included in the appendix section.
On the other hand, the researchers also keep on associating with the traditional
method as they want the reader to understand their concept and also to have an insight from
the young, urban and technology savvy generations view of the use of the language. The
researcher team has successfully challenged themselves to use over 50 million Twitter data to
be analysed and narrow down their research by looking into environment in which the words
were used whether urban and rural location which finally revealed a major surprise that is
two Spanish superdialects are exposed.
2.4 References
In spite of that, while readers should be reading smoothly, they are interrupted by
struggling to apprehend the results and discussion as the researchers do not include their
supporting arguments related in the written article. Mostly, the researchers only use the
references list for the readers to search on our own and make the association throughout the
entire article. Somehow, it may not be suitable for the readers who have to do the work of the
researchers just to make sure the research is basically following the references list.
Moreover, for those audiences who might not familiar in this field, will face such a nuisance
to keep them on track with the article and related references provided. Basically, by adding
supportive details or quotes are used to back up the points that would be more reliable and
solid research when the other researchers, scholars or studies are mentioned between the
lines.
Responding to the list of references in this article, the researchers had done a list of
twenty-seven references taken from books, online articles, conference papers, related articles,
journals and university press articles. The range of publication years starts from 1998 to 2013
that are quite latest and updated references. The list is numbered as well as in the article to
get the readers follows the reference track. In addition, some references have page number for
further reading and to show where the points are taken from specifically. To those who
already have the access to these references, it would be an advantage for them to get
additional information while the others would not be satisfied with the only brief evidence
taken from the article itself.
Another interesting finding in the list of references is the common name in this field,
Nguyen C.D who is also one of the researchers using crowdsourcing in his experiment of
Twitter data in his research of predicting gender and age in language use which means the
researchers are using the same resource to get feedback from the online community
especially in language use in day to day conversation. One of the advantages in using online
data is the people involved that are the samplings, are not restricted with the significant terms
of research sample as they could possibly tweets freely without restraint and their tweets are
examined which makes the sample data reliable and valid.
3.0 Research Advantages
The researchers team had done a new discovery in discovering Spanish dialect when
they realize the existences of two superdialects from their finding. Learning dialects are not
an easy task, but when we know where the dialect comes from, then we should be aware and
respect the community in that area especially when it comes to the usage of those particular
words. A complete list of 43 words for each concept are truly beneficial to the linguists,
travellers, tourists and also to the other community in different area where they could
understand these related concept in understanding and communicating in the conversation.
Meanwhile, at the same time people mostly living in urban and rural area could possibly
notice their dialect differences and they would highly appreciate each other by this research
finding as to hold on to their own identity and originality. According to James Lantolf, Penn
State professor of Spanish and linguistics and director of the Center for Language Acquisition
stated that the patterns of settlement when the area was first discovered and developed have a
huge impact in dialect influences. For instance, the regional dialect is largely attributable to
the many various nationalities developed and living in that area. Lantolf points out "a region's
geographic location also has a direct influence on the development of a local tongue," and he
continues "where there is no contact between regions, entire words, languages and
vernaculars can grow and evolve independently.
Most Spanish native speakers face the difficulty in understanding various dialects
coming from the other region even though they are speaking the same language. This
research has selected part of the lexical corpus to distinguish the words in each region. It
might be a useful reference to the speakers as they can actually compare and contrast with
their own and also expose to the other dialect in ones language. The research also could
possibly suggest the most spoken Spanish dialect to generally announce the familiarity from
the Spanish speakers around the world. Even sometimes people who are really good in
Spanish could not understand Spanish because of these variances. So this new discovery
which being listed together is really an advantage for each and every one to accommodate
and assimilate the concept in wherever they situated at.
4.0 Application to personal experience
The study of dialectology is an interesting field to be explored as language is not a
static medium. It also sees the insight of the originality and influences from the speech
community and immigrants across the regions. In my personal view in response to this
research, the ultimate reason in having this extra knowledge is being part of community as
the best way in seeing the culture and learning the dialect of a language. Therefore, knowing
other dialects even though in other language would be a plus side of a person to get
acknowledge. As far as I concern, Spanish is an interesting language to be learned and its
geographical lines tells us a lot more about it by using various dialects alone. Focusing in my
own small place of origin, I would appreciate more of the other dialects around the states
instead of my own Kelantanese dialect because it represents our identity of each land that we
live. Furthermore, we need to be appropriately presenting ourselves when travelling to the
other places as well as respecting the host by learning their own language and dialects to get
closer to them. Respectively, the community would be welcoming us with open hands and
treat us like their own relatives when we know how to communicate especially using their
own language specifically in their dialect. Our unique differences would be identified and
valued in a way of appreciating the world languages ever existed.
In relation to this research, I also come across on reading an article about the
Malay Dialect Research in Malaysia: The Issues of Perspectives by James T. Collins, a
linguist from University of Hawaii in 1987. The research actually done in relation Malay
language had once been a lingua franca across the continent and had interest the linguists and
dialectologists to identify the diversification from the branches of the existing dialects. In
1986, the publication of the Malaysian Language and Literature Agency (Dewan Bahasa dan
Pustaka), the authors name regional dialects in Malaysia which are Pulau Pinang, Kedah and
Perlis, Perak, Selangor, Negeri Sembilan, Melaka, Johor, Pahang, Terengganu, Kelantan,
Sarawak and Sabah' (Safiah et al. 1986:31-32). Fascinatingly, these dialects divergences had
already existed and being officially published. Conversely, the issue here is there was no
scientific research had been done in proving these dialects were official. At that time, this
kind of research had to imply the traditional method like being mentioned in Goncalves and
Sanchezs article. Yet, they also lack of modern technology which could help processing the
data and collecting the samples.
Then again, my readings goes on for the latest article with the similar topic discussed
and amazingly, I found a recent article written by Hajar Abdul Rahim on Corpora in
Language Research in Malaysia that is published in Kajian Malaysia, Volume 32, 2014. In
her article, there are many significant projects relating the Malay language corpus. According
to Hajar, in between 1995 and 2014, the corpus has been used to inform work on Malay
dictionaries published by DBP, including the much referred to Kamus Dewan which has
reached its fourth edition as the words comprises up to 85 millions. She also added besides
the facilitation of the Malay language for ongoing studies and publications, the DBP corpus
database is also open for resource for researchers who wish to develop their own corpus for
research in Malay. There are many examples of ongoing studies which keep me fascinating
with the use of Malay corpora and interestingly some words from the Malay dialects are now
considered official when DBP had taken them to another level of standard Malay language
whether in the literature text, written or spoken Malay.
5.0 Conclusion
Although regional variation in language is not the definition of dialect, it is definitely
a characteristic of dialect. Dialect is a complicated issue with many different components;
however, understanding is the key to effective communication, no matter what the occupation
or social role. Studies like these may allow or suggest to other researchers and linguists to dig
deeper into how language varies across place, time and culture. From this research article
response, I conclude that this study is one of the most inspiring studies been done and
personally admires the existing language and its branches as well as to appreciate the speech
community who build the incredible and unique system of language dialects in whatsoever
language existed in this world.

References
http://www.necc.mass.edu/wp-content/uploads/2010/07/researcharticle.pdf
http://arxiv.org/pdf/1407.7094v1.pdf
http://news.psu.edu/story/141216/2005/08/29/research/probing-question-how-did-regional-
accents-originate
http://www.altalang.com/beyond-words/2008/11/13/10-spanish-dialects-how-spanish-is-
spoken-around-the-world/
http://www.technologyreview.com/view/529836/computational-linguistics-of-twitter-reveals-
the-existence-of-global-superdialects/
http://www.ling.upenn.edu/phono_atlas/Atlas_chapters/Ch01_2nd.rev.pdf
http://www.socialresearchmethods.net/kb/guideelements.php
http://web.usm.my/km/32(Supp.1)2014/KM%2032%20Supp%201%202014%20-%20Art%201(1-
16).pdf
http://www.sabrizain.org/malaya/library/dialectresearch.pdf

Response Paper On Crowdsourcing Dialect Characterization Through Twitter

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Response Paper On Crowdsourcing Dialect Characterization Through Twitter

Diunggah oleh

Hak Cipta:

Format Tersedia

Response Paper on Crowdsourcing Dialect Characterization through Twitter

This research article by Goncalves and Sanchez from Aix-Marseilla University,

distributed over several continents.

quite interesting to be learned and acknowledged.

2.0 Research review

2.1 Research Elements

In addition, this research had been posted in technologyreview.com website to

understanding, and generation of human languages. It is often regarded as a subfield of

artificial intelligence. Techniques from computational linguistics are used in applications

such as machine translation, speech recognition, information retrieval, intelligent Web

2.2 Related Work

gloss=speech) boundaries separating regions of a country which uses different words or

reminder of the patterns of migration of Americans moving to Westward. Nevertheless, the

language dialect to be highlighted where in this case Spanish.

social network. (i.e. Twitter).

language so it is in a way a new disclosure to me in learning Spanish as well. Theoretically,

a reasonable and rational decision to make.

According to socialresearchmethods.net, the table should be concisely formatted while the

measures are supposedly to be included in the appendix section.

two Spanish superdialects are exposed.

taken from the article itself.

examined which makes the sample data reliable and valid.

3.0 Research Advantages

understand these related concept in understanding and communicating in the conversation.

vernaculars can grow and evolve independently.

and assimilate the concept in wherever they situated at.

4.0 Application to personal experience

The study of dialectology is an interesting field to be explored as language is not a

acknowledge. As far as I concern, Spanish is an interesting language to be learned and its

live. Furthermore, we need to be appropriately presenting ourselves when travelling to the

valued in a way of appreciating the world languages ever existed.

Malay Dialect Research in Malaysia: The Issues of Perspectives by James T. Collins, a

data and collecting the samples.

whether in the literature text, written or spoken Malay.

Although regional variation in language is not the definition of dialect, it is definitely

a characteristic of dialect. Dialect is a complicated issue with many different components;

language existed in this world.

Anda mungkin juga menyukai