vendredi 6 décembre 2019

Matcher problem. Mathcher picks up result through one, and the first position may differ from time to time, Java

There is a site https://m.imdb.com/list/ls000984564/. I'm trying to get all names of actors from there (there are 35 of them). I found the pattern that allows me to perform this work but the problem is that I can get only the half of the list through the one, and the first element can differ from time to time. So one time I can get 1,3,5,...,35 names and other time 2,4,6,...,34 names but never all of them at the same time. What do I do wrong? The code is below.

public class Main {
        public static void main(String[] args) {
            String str = "https://m.imdb.com/list/ls000984564/";

            HttpURLConnection urlConnection = null;

            try {
                URL url = new URL(str);
                urlConnection = (HttpURLConnection) url.openConnection();
                InputStream inputStream = urlConnection.getInputStream();
                InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
                BufferedReader bufferedReader = new BufferedReader(inputStreamReader);

                StringBuilder htmlCode = new StringBuilder();

                while (bufferedReader.readLine() != null)
                    htmlCode.append(bufferedReader.readLine());

                urlConnection.disconnect();

                ArrayList<String> actorsNamesList = new ArrayList<>();
                Pattern pattern = Pattern.compile("<h4>(.*?)</h4>");
                Matcher matcher = pattern.matcher(htmlCode.toString());

                while (matcher.find())
                    actorsNamesList.add(matcher.group(1));

                for (String name : actorsNamesList)
                    System.out.println(name);

                System.out.println("Size of a list: " + actorsNamesList.size());

            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (urlConnection != null) {
                    urlConnection.disconnect();
                }
            }
        }
    } 

One of possible results:

Leonardo DiCaprio
Gary Oldman
Johnny Depp
Denzel Washington
Russell Crowe
Robert Downey Jr.
George Clooney
Josh Brolin
Paul Rudd
James Franco
Will Smith
Joseph Gordon-Levitt
Jack Nicholson
Stanley Tucci
Jamie Foxx
Tommy Lee Jones
Eric Bana
Jon Hamm
Size of a list: 18

Another one that I'm fequently getting in Android Studio:

2019-12-07 02:22:20.837 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Christian Bale
2019-12-07 02:22:20.837 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Mark Wahlberg
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Matt Damon
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Daniel Day-Lewis
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Steve Carell
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Edward Norton
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Brad Pitt
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Ryan Reynolds
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Geoffrey Rush
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Ken Watanabe
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Aaron Eckhart
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Clive Owen
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Will Ferrell
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Benicio Del Toro
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: James Gandolfini
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Josh Hartnett
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Greg Kinnear
2019-12-07 02:22:20.840 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Size: 17

Aucun commentaire:

Enregistrer un commentaire