Publications and Research

Document Type


Publication Date

Spring 4-25-2020


Streaming APIs allow for big data processing of native data structures by providing MapReduce-like operations over these structures. However, unlike traditional big data systems, these data structures typically reside in shared memory accessed by multiple cores. Although popular, this emerging hybrid paradigm opens the door to possibly detrimental behavior, such as thread contention and bugs related to non-execution and non-determinism. This study explores the use and misuse of a popular streaming API, namely, Java 8 Streams. The focus is on how developers decide whether or not to run these operations sequentially or in parallel and bugs both specific and tangential to this paradigm. Our study involved analyzing 34 Java projects and 5.53 million lines of code, along with 719 manually examined code patches. Various automated, including interprocedural static analysis, and manual methodologies were employed. The results indicate that streams are pervasive, stream parallelization is not widely used, and performance is a crosscutting concern that accounted for the majority of fixes. We also present coincidences that both confirm and contradict the results of related studies. The study advances our understanding of streams, as well as benefits practitioners, programming language and API designers, tool developers, and educators alike.


These datasets are affiliated with the article An Empirical Study on the Use and Misuse of Java 8 Streams, available at

characteristics_subject_data.csv (10 kB)
General attributes of the subjects used for the stream characteristics analysis.

characteristics_subject_ages.csv (1 kB)
Ages of the subjects used for the stream characteristics analysis.

method_calls_subjects.csv (1 kB)
Data on the subjects used for the method call analysis.

characteristics_stream_execution_modes.csv (162 kB)
Stream execution modes.

characteristics_stream_attributes.csv (172 kB)
Stream side-effects and stateful intermediate operations (SIOs).

method_calls.csv (32 kB)
Raw data for the stream operation analysis.

bugs.csv (22 kB)
Raw data for the stream bug study.

characteristics_subject_entry_points.csv (1308 kB)
Detailed info on entry points used in the characteristics analysis.

characteristics_stream_orderings.csv (141 kB)
Stream orderings.