corpus. The SNAP corpus has ****_______ reviews of products harvested from
Amazon.com. See Leskovec, J., & Sosic, R. (2014). Snap. py: SNAP for Python, a
general purpose network analysis and graph mining tool in Python; J. McAuley and
J. Leskovec. Hidden factors and hidden topics: understanding raLng dimensions
with review text. RecSys, 2013.
genre hybridity. See Selber, S. A. (2010). A rhetoric of electronic instrucLon sets.
Technical CommunicaLon Quarterly, 19(2), 95-117.
Previous research showed that among product reviews rated as most helpful by
users of the service, those containing experience informaLon accounts of
customers using the product were rated higher than those that did not. Skalicky,
S. (2013). Was this analysis helpful? A genre analysis of the Amazon. com discourse
community and its most helpful product reviews. Discourse, Context & Media,
2(2), 84-93.
We wondered whether computaLonal tools could reliably used to idenLfy Amazon
reviewsa genrethat included instrucLonal componentsevidence of another
genre hybridizing with the rst.
Research team had to gure out whether humans could idenLfy Lps, hacks, and
warnings.
The SNAP corpus was already divided into sentences when we got hold of it.
We took a few thousand sentences and developed a coding guide through an
iteraLve process.
Research team had to gure out whether humans could idenLfy Lps, hacks, and
warnings.
The SNAP corpus was already divided into sentences when we got hold of it.
We took a few thousand sentences and developed a coding guide through an
iteraLve process.
Explain this.
SVM is a staLsLcal MLA. It uses known values to idenLfy a dividing line (known as a
hyperplane) that correctly divides the training instances and is at maximum distance
from them. Ask me during Q&A if you want me to try to explain this. (I was not on
dev team, but I used SVM classier in a research project of my own.)
10
11
Discuss this.
12
Ryan Omizo notes: The tesLng set was rather small, so the scores are probably a
liole too good. Also, F1, precision, and accuracy are more generous than other
metrics such as Cohen's Kappa, so we are starLng a bit ahead anyway. But the scores
are good for a 3 day prototype.
13
We set out to learn if a phenomenon like genre hybridity could be found in product
reviews as hypothesized by Selber [1] and suggested, albeit faintly, by Skalicky [14].
Working together, we learned that it could be found and, based on the results of the
RT, that it could be reliably found by humans. We also learned from the DT that it
was at least plausible that the signals for instrucLonal text are disLnct enough from
that of the persuasive components of reviews that we could train a machine-learning
algorithm to idenLfy these in unseen texts. Finally, we learned from the UXT that
nding the bits of instrucLonal texts in product reviews and presenLng them in a
disLnct view could be a useful service for consumers in its own right.
From Bill H-D: 1. TentaLvely, we would say that the repeLLve signals of instrucLonal
text were indeed strong enough in the corpus we worked with that we could train a
machine learning algorithm to reliably nd and disLnguish them from the persuasive
signals of a product review.
2. Would we nd genres embedded in and intermixed with others? Our subtracLve
synthesis shows that we do, indeed. How much, how ozen, and whether the more
"hybrid" reviews are more highly ranked and/or useful to readers remain quesLons
to be answered. But we have more reason to think we can answer them azer our
iniLal work.
14
Kenny Walker notes: We had a few lingering consideraLons with the coding guide.
In parLcular, . . .the interesLng variaLons we found on mediaLng the interacLon of
the reader/user (i.e., does the charge to be paLent mediate interacLon?) . . . . [T]he
variaLons on what consLtutes typied rhetorical acLon are key concerns for genre
hybridity.
15