Anda di halaman 1dari 10

Response Paper on Crowdsourcing Dialect Characterization through Twitter

1.0 Introduction

This research article by Goncalves and Sanchez from Aix-Marseilla University,

France has grabbed my attention the most when it comes to personalising dialect. Discussing

about dialect inspires about the persons geographical origin where people who are living in

different region could be using various words that talks about one specific thing. Linguists

are fascinated with dialects when they reveal social class, pattern of immigration and how

groups have influence each other in the past. In this research, Spanish is being chosen to be

the language research as it is the one of the most spoken language in the world and spatially

distributed over several continents.

Using crowdsourcing is one of the fastest and reliable resources in collecting data.

Even though the researchers took two to three consecutive years to gather and analyse data,

the finding had been using language detector from the Chromium Compact Language

Detector software library and Twitter garden hose to gather and locate the unbiased sample

from Twitter users. As for the finding, there are two superdialects in Spanish uncovered and

the researchers come out with a list of corpus in Spanish varieties characterization which is

quite interesting to be learned and acknowledged.

2.0 Research review

2.1 Research Elements

The elements in this research article which I prefer is the abstract where they

compressed the research into one comprehensive and compact summary of the whole and

provide understanding which get the readers to understand the entire research is all about.

Nevertheless, this article is lacking some important sub headings as to retain their
effectiveness and originality to the readers. Under the Methodology heading, there should be

three components to get into a detail research that are Study Sample, Data Collection and

Measures. By having these three components, this article would be understandable in a way

of consistency and systematically written. The confusion arises within the readers as we have

to keep the important points in mind whether the researchers already mention it or not.

Though, the article has combined Result and Discussion in one heading and divided into

subheadings to focus on the individual points or results. Interestingly, the researchers also

claim that this research has in a way offered much further and detail research to be done as

they had a large dataset to be analysed in many forms of related sociolinguistic study. They

also welcome and appreciate the latest and modern technology that has contributed to a more

constant finding.

In addition, this research had been posted in technologyreview.com website to

promote the use of computational linguistics study which is the study of computer processing,

understanding, and generation of human languages. It is often regarded as a subfield of

artificial intelligence. Techniques from computational linguistics are used in applications

such as machine translation, speech recognition, information retrieval, intelligent Web

searching, and intelligent spelling checking. This current simplified development of study is

one of the latest usage in linguistics study including this particular research from Goncalves

and Sanchez

2.2 Related Work

While this research is claimed to be the first study of dialect on Twitter that reveals

global pattern in global scale, still the researcher should provide the reader to a related scope

which also been studied before. There was the most famous American study performed by

Hans Kurath in the second quarter of this century, and covered most of the east a quarter of
the United States. Though Kurath uses the traditional method by interviewing and travelling

to various locales, what Kurath and all dialectologists looked for is the isoglosses (iso=same

gloss=speech) boundaries separating regions of a country which uses different words or

constructions to describe the same things. He found in some parts of the country that the

isoglosses for several unrelated words fell in practically the same locations, forming bundles

of isoglosses. These bundles were significant discoveries, as they indicated the existence of a

real correlation between speech patterns and region. They also provided a living linguistic

reminder of the patterns of migration of Americans moving to Westward. Nevertheless, the

significant in this current research from Goncalves and Sanchez is the use of reliable

resources with millions data involves without having to get up from the seat to travel and face

the people. Their initial means are to promote computational linguistics study with another

language dialect to be highlighted where in this case Spanish.

The researchers are mainly initiating their research to be upgraded from the previous

time; the traditional methodology would not be suitable and appropriate to build a concrete

finding. They have come out with the latest version and modern updated technology to help

the research being done which is also provided faster result that include the participation from

each and every community lives in that particular area using the GPS system. Furthermore,

they want to broaden their research area to be more specifically entertained the data by the

individuals to use their vernacular language freely as in the medium that they choose is the

social network. (i.e. Twitter).

While the researchers focusing on their aim of the study, they would probably choose

any language to be the centralised. However, at first I was surprised as I would like to see the

result from an English language study but they have chosen Spanish instead of English to be

the selected language. Nonetheless, there were many studies had been done in English

language so it is in a way a new disclosure to me in learning Spanish as well. Theoretically,


even though English and Mandarin have the most native speaker on the globe, both the

languages are not in suitable for dialect research when Mandarin has limited local access to

Twitter and English will face a complex lexicographic analysis because of abundance of

homograph that could be confusing in initial meaning. Finally, Spanish is the most suitable

language for it has vast population of the speakers around the globe and the researchers have

made a practical choice to provide the study with a non-biased language where I would find it

a reasonable and rational decision to make.

2.3 Methodology

Apart from that, the measures written seems to be heavily cramped by matrix and

statistical symbols as the way the researchers convey their words are quite complicated

however it is quickly hoarded by the help of figures and maps that could be one of the easiest

ways to look upon and clearly stated the highlighted concepts. To some point, readers should

have some background knowledge of statistic to clearly understand on how the calculation

works. Basically, by using maps to explain the geographical sites makes the readers easier to

relate the finding as images would be a good aid in projecting what the researchers had come

about. The plotted, coloured map also gives a clear view of the words distribution commonly

used in certain part of the world by segmenting into colours. However, to simplify the data,

they should most likely summarise the findings into a table where it gives clearer view to

identify the countries or regions that are demonstrating different geographic dialects.

According to socialresearchmethods.net, the table should be concisely formatted while the

measures should be briefly explained with appropriate citation and references. The complete

measures are supposedly to be included in the appendix section.

On the other hand, the researchers also keep on associating with the traditional

method as they want the reader to understand their concept and also to have an insight from
the young, urban and technology savvy generations view of the use of the language. The

researcher team has successfully challenged themselves to use over 50 million Twitter data to

be analysed and narrow down their research by looking into environment in which the words

were used whether urban and rural location which finally revealed a major surprise that is

two Spanish superdialects are exposed.

2.4 References

In spite of that, while readers should be reading smoothly, they are interrupted by

struggling to apprehend the results and discussion as the researchers do not include their

supporting arguments related in the written article. Mostly, the researchers only use the

references list for the readers to search on our own and make the association throughout the

entire article. Somehow, it may not be suitable for the readers who have to do the work of the

researchers just to make sure the research is basically following the references list.

Moreover, for those audiences who might not familiar in this field, will face such a nuisance

to keep them on track with the article and related references provided. Basically, by adding

supportive details or quotes are used to back up the points that would be more reliable and

solid research when the other researchers, scholars or studies are mentioned between the

lines.

Responding to the list of references in this article, the researchers had done a list of

twenty-seven references taken from books, online articles, conference papers, related articles,

journals and university press articles. The range of publication years starts from 1998 to 2013

that are quite latest and updated references. The list is numbered as well as in the article to

get the readers follows the reference track. In addition, some references have page number for

further reading and to show where the points are taken from specifically. To those who

already have the access to these references, it would be an advantage for them to get
additional information while the others would not be satisfied with the only brief evidence

taken from the article itself.

Another interesting finding in the list of references is the common name in this field,

Nguyen C.D who is also one of the researchers using crowdsourcing in his experiment of

Twitter data in his research of predicting gender and age in language use which means the

researchers are using the same resource to get feedback from the online community

especially in language use in day to day conversation. One of the advantages in using online

data is the people involved that are the samplings, are not restricted with the significant terms

of research sample as they could possibly tweets freely without restraint and their tweets are

examined which makes the sample data reliable and valid.

3.0 Research Advantages

The researchers team had done a new discovery in discovering Spanish dialect when

they realize the existences of two superdialects from their finding. Learning dialects are not

an easy task, but when we know where the dialect comes from, then we should be aware and

respect the community in that area especially when it comes to the usage of those particular

words. A complete list of 43 words for each concept are truly beneficial to the linguists,

travellers, tourists and also to the other community in different area where they could

understand these related concept in understanding and communicating in the conversation.

Meanwhile, at the same time people mostly living in urban and rural area could possibly

notice their dialect differences and they would highly appreciate each other by this research

finding as to hold on to their own identity and originality. According to James Lantolf, Penn

State professor of Spanish and linguistics and director of the Center for Language Acquisition

stated that the patterns of settlement when the area was first discovered and developed have a

huge impact in dialect influences. For instance, the regional dialect is largely attributable to
the many various nationalities developed and living in that area. Lantolf points out "a region's

geographic location also has a direct influence on the development of a local tongue," and he

continues "where there is no contact between regions, entire words, languages and

vernaculars can grow and evolve independently.

Most Spanish native speakers face the difficulty in understanding various dialects

coming from the other region even though they are speaking the same language. This

research has selected part of the lexical corpus to distinguish the words in each region. It

might be a useful reference to the speakers as they can actually compare and contrast with

their own and also expose to the other dialect in ones language. The research also could

possibly suggest the most spoken Spanish dialect to generally announce the familiarity from

the Spanish speakers around the world. Even sometimes people who are really good in

Spanish could not understand Spanish because of these variances. So this new discovery

which being listed together is really an advantage for each and every one to accommodate

and assimilate the concept in wherever they situated at.

4.0 Application to personal experience

The study of dialectology is an interesting field to be explored as language is not a

static medium. It also sees the insight of the originality and influences from the speech

community and immigrants across the regions. In my personal view in response to this

research, the ultimate reason in having this extra knowledge is being part of community as

the best way in seeing the culture and learning the dialect of a language. Therefore, knowing

other dialects even though in other language would be a plus side of a person to get

acknowledge. As far as I concern, Spanish is an interesting language to be learned and its

geographical lines tells us a lot more about it by using various dialects alone. Focusing in my

own small place of origin, I would appreciate more of the other dialects around the states
instead of my own Kelantanese dialect because it represents our identity of each land that we

live. Furthermore, we need to be appropriately presenting ourselves when travelling to the

other places as well as respecting the host by learning their own language and dialects to get

closer to them. Respectively, the community would be welcoming us with open hands and

treat us like their own relatives when we know how to communicate especially using their

own language specifically in their dialect. Our unique differences would be identified and

valued in a way of appreciating the world languages ever existed.

In relation to this research, I also come across on reading an article about the

Malay Dialect Research in Malaysia: The Issues of Perspectives by James T. Collins, a

linguist from University of Hawaii in 1987. The research actually done in relation Malay

language had once been a lingua franca across the continent and had interest the linguists and

dialectologists to identify the diversification from the branches of the existing dialects. In

1986, the publication of the Malaysian Language and Literature Agency (Dewan Bahasa dan

Pustaka), the authors name regional dialects in Malaysia which are Pulau Pinang, Kedah and

Perlis, Perak, Selangor, Negeri Sembilan, Melaka, Johor, Pahang, Terengganu, Kelantan,

Sarawak and Sabah' (Safiah et al. 1986:31-32). Fascinatingly, these dialects divergences had

already existed and being officially published. Conversely, the issue here is there was no

scientific research had been done in proving these dialects were official. At that time, this

kind of research had to imply the traditional method like being mentioned in Goncalves and

Sanchezs article. Yet, they also lack of modern technology which could help processing the

data and collecting the samples.

Then again, my readings goes on for the latest article with the similar topic discussed

and amazingly, I found a recent article written by Hajar Abdul Rahim on Corpora in

Language Research in Malaysia that is published in Kajian Malaysia, Volume 32, 2014. In

her article, there are many significant projects relating the Malay language corpus. According
to Hajar, in between 1995 and 2014, the corpus has been used to inform work on Malay

dictionaries published by DBP, including the much referred to Kamus Dewan which has

reached its fourth edition as the words comprises up to 85 millions. She also added besides

the facilitation of the Malay language for ongoing studies and publications, the DBP corpus

database is also open for resource for researchers who wish to develop their own corpus for

research in Malay. There are many examples of ongoing studies which keep me fascinating

with the use of Malay corpora and interestingly some words from the Malay dialects are now

considered official when DBP had taken them to another level of standard Malay language

whether in the literature text, written or spoken Malay.

5.0 Conclusion

Although regional variation in language is not the definition of dialect, it is definitely

a characteristic of dialect. Dialect is a complicated issue with many different components;

however, understanding is the key to effective communication, no matter what the occupation

or social role. Studies like these may allow or suggest to other researchers and linguists to dig

deeper into how language varies across place, time and culture. From this research article

response, I conclude that this study is one of the most inspiring studies been done and

personally admires the existing language and its branches as well as to appreciate the speech

community who build the incredible and unique system of language dialects in whatsoever

language existed in this world.


References

http://www.necc.mass.edu/wp-content/uploads/2010/07/researcharticle.pdf

http://arxiv.org/pdf/1407.7094v1.pdf

http://news.psu.edu/story/141216/2005/08/29/research/probing-question-how-did-regional-

accents-originate

http://www.altalang.com/beyond-words/2008/11/13/10-spanish-dialects-how-spanish-is-

spoken-around-the-world/

http://www.technologyreview.com/view/529836/computational-linguistics-of-twitter-reveals-

the-existence-of-global-superdialects/

http://www.ling.upenn.edu/phono_atlas/Atlas_chapters/Ch01_2nd.rev.pdf

http://www.socialresearchmethods.net/kb/guideelements.php

http://web.usm.my/km/32(Supp.1)2014/KM%2032%20Supp%201%202014%20-%20Art%201(1-

16).pdf

http://www.sabrizain.org/malaya/library/dialectresearch.pdf

Anda mungkin juga menyukai