Research Data Updates from Craig

November 2023

In this update, I cover needs for archival research as well as software and tools for sentiment analysis.


Is Human Subjects approval needed for archival research?

Short answer: it depends. It is important to understand the type of data you are using to know whether additional steps for research compliance are needed.

Many databases offer information on individuals, including CoreLogic, Execucomp, and BoardEx. This data is often retrieved from filings and available on government sites. Some information is simply a name; others include address, phone, email and more. Data from publicly available regulatory filings does not need human subjects review. Other datasets we have like Brandwatch and CrowdTangle receive data from social media sites such as X, Instagram, Facebook, and Reddit. Remove as much identifying information as possible when using these data to protect the privacy of the individuals. If you obtain data with identifying information, then it will likely need human subjects approval.

Examine the IU Office of Research Administration's (ORA's) Protocol Decision Tree to see about obtaining/using private, identifiable information. Private information includes information about behavior an individual can reasonably expect that no observation or recording is taking place. IU's CITI module on Internet-Based Research specifies, "if the material could be accessed only by registering an account and logging in, then one could argue that the content on that page is private; whereas any material that could be accessed without an account or by logging in, could be considered public." For example, Kelley licenses LinkedDb, which is data scraped from public LinkedIn pages. However, using a LinkedIn account to scrape more detailed data would be considered private data.

Blogs and social media sites provide rich data for research. Remember, the ORA considers the reasonable expectation of privacy. From CITI, "individuals may have privacy expectations that are at odds with the technology they are using. For example, people in online support groups may consider their communications within the group to be private even if the group is publicly accessible. Their expectation may be that their communications within the group are not to be observed and recorded for purposes such as marketing or research."

Also related to this type of data gathering remember third parties' policies, and user agreements or terms of service listed on the site or outlined in an API. Some sites or services may require that deleted or modified posts or entries be updated ("rehydrated") from a collected dataset.

Consult the IU Office of Research Administration (ORA)

Human Subjects & Institutional Review Boards

Levels of Review: Human Subjects & Institutional Review Boards

Collaborative Institutional Training Initiative (CITI) program

Social/Behavioral Researchers Course: Module - Defining Research with Human Subjects

Supplemental Module - Internet-Based Research


Software and Tools for Sentiment Analysis

Several options are available for including sentiment analysis in your research.

Ascribe

Web-based: Kelley School license. Single sign-on at https://app.goascribe.us/ascribe. Login in with account: indiana and for email use username@iu.edu.

Official site: https://goascribe.com

Ascribe is one of the easiest to start using. Upload an excel file to analyze open-ended survey data or text. CX Inspector is a customizable and feature-rich text analytics tool that provides topic and sentiment analysis from comments automatically. Data dashboards can easily be shared online. Ascribe Illustrator is an interactive research tool that transforms multiple sources of data (often in multiple languages) into an array of images in real time, from simple charts, to detailed reports as well as comprehensive dashboards.

Qualtrics Text IQ

Official site: https://qualtrics.com

Text iQ is Qualtrics' powerful text analysis tool built into the Data & Analysis Tab. Text iQ allows you to assign topics to feedback you've received, perform sentiment analysis and report out on your results with dynamic widgets. Overall Sentiment is the sentiment score for a given response. Every response analyzed in Text iQ will have only 1 overall sentiment score. Topic Sentiment is the sentiment score of a particular topic in your text response. Responses can have multiple Topic Sentiment scores as each topic is assigned its own score.

NVIVO 14

Mac and Windows at IUware https://iuware.iu.edu/search?q=nvivo

Official site: https://lumivero.com/products/nvivo/

Quickly visualize your data with word frequency charts, word clouds, comparison diagrams, and more. Look for emerging topics and sentiments using specific queries to identify themes and draw conclusions. After autocoding to identify sentiment, you can access a chart which provides a visual representation of your results, indicating the names of the codes and the number of coding references for each code. NVivo's sentiment analysis makes it a breeze to extract insights from your text data.

MAXQDA

Mac and Windows at SSRC https://ssrc.indiana.edu/resources/maxqda.html

Official site: https://www.maxqda.com/

Analyze sentiments of survey responses and Autocode responses. With MAXQDA you can perform a sentiment analysis of the answers listed in the Survey Analysis window. This automatically evaluates whether the content is to be assessed as negative, neutral, or positive. The survey answers can then be autocoded according to their sentiments.

MATLAB Text Analytics Toolbox

Mac and Windows at IUWare https://iuware.iu.edu/search?q=matlab

Official site: https://www.mathworks.com/products/text-analytics.html

Use Get Add-Ons after installation. Text Analytics Toolbox provides algorithms and visualizations for preprocessing, analyzing, and modeling text data. Models created with the toolbox can be used in applications such as sentiment analysis, predictive maintenance, and topic modeling. Text Analytics Toolbox includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. You can extract text from popular file formats, preprocess raw text, extract individual words, convert text into numerical representations, and build statistical models.

ATLAS.ti

Only available in the Qual Lab at SSRC https://ssrc.indiana.edu/facilities/quallab/lab.html

Official site: https://atlasti.com/

ATLAS.ti is most powerful for analyzing video, audio, and images, in addition to text in language. ATLAS.ti includes a sentiment analysis tool to automatically code data.


April 2023

ESG

ESG is a hot topic right now. At Kelley we have data from MSCI, RepRisk, Trucost, ISS, and our aggregators FactSet, Bloomberg, Refinitiv, and S&P Global.

SOCIAL MEDIA

For social media analytics research, Kelley has data from Brandwatch Consumer Research and now CrowdTangle. Neither has TikTok data.

OPEN TEXT

For open text, IU's Qualtrics license includes Text IQ which automatically organizes comments by topic and assigns sentiment scores.

ASCRIBE & CX INSPECTOR

Kelley also licenses Ascribe with CX Inspector, which is for text analytics, and Illustrator, which creates more visual reports and live dashboards.

BLOOMBERG & Eikon

The school updated the two Bloomberg terminal laptops for faculty and PhD checkout at the KSBIT help desk. There are now three Bloomberg terminals in the Business/SPEA Library. Refinitiv sunset the Datastream standalone application. Access is now on the Eikon terminal in the library.

SNAPSTREAM

Need something that will air on television to use for a class or research? Kelley contributes to Snapstream, a cloud-based video clipping product that records and transcribes live video. It can record the program and make it easy to integrate into a class.

WRDS

WRDS continues to add features. The WRDS People Link Suite is a linking table of people identification across Execucomp, BoardEx, Refinitiv Insiders, and CIQ People Intelligence. WRDS has many ways to connect to its databases besides the web interface and SAS. These methods include python, R, Stata, Matlab, PostgreSQL, SAS/Studio, R Studio, and WRDS JupyterHub.

Reach out to Craig ceich@indiana.edu with any questions or more details.

CE

Craig Eich

Research & Business Analytics Director