Java Program to Find Distinct Word List From a File

Java Program to Find Distinct Word List From a File thumbnail
13K
By Dhiraj Ray 31 December, 2017

Description

Java program to find distinct words from file is a very common question in java interview.In the following program, we will be using BufferedReader to read a file and then retain distinct words from it. To achieve this, we will be using Set to store all the words from a file and since, set dos not allow duplicates, we can easily find the distinct words.Following is the complete java program for this.

DistinctWordList.java
package com.devglan;

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.*;

public class DistinctWordList {

    public Set getDistinctWordList(String fileName){

        FileInputStream fis;
        DataInputStream dis;
        BufferedReader br =  null;
        Set wordList = new HashSet<>();
        try {
            fis = new FileInputStream(fileName);
            dis = new DataInputStream(fis);
            br = new BufferedReader(new InputStreamReader(dis));
            String line;
            while((line = br.readLine()) != null){
                StringTokenizer st = new StringTokenizer(line, " ,.;:\"");
                while(st.hasMoreTokens()){
                    wordList.add(st.nextToken().toLowerCase());
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally{
            try{
                if(br != null){
                    br.close();
                }
            }catch(Exception ex){
                System.out.println(ex.getMessage());
            }
        }
        return wordList;
    }

    public static void main(String a[]){

        DistinctWordList distinctFileWords = new DistinctWordList();
        Set wordList = distinctFileWords.getDistinctWordList("C:/test.txt");
        for(String str : wordList){
            System.out.println(str);
        }
    }
}

Explanation

The tokenizer used here allows an application to break a string into tokens.Once, the line is tokenized, put it inside a set that does not allow any duplicates.

Share

If You Appreciate This, You Can Consider:

We are thankful for your never ending support.

About The Author

author-image
A technology savvy professional with an exceptional capacity to analyze, solve problems and multi-task. Technical expertise in highly scalable distributed systems, self-healing systems, and service-oriented architecture. Technical Skills: Java/J2EE, Spring, Hibernate, Reactive Programming, Microservices, Hystrix, Rest APIs, Java 8, Kafka, Kibana, Elasticsearch, etc.